Post-Mortems
Post Mortem Report: Service Interruption of Nolus Protocol on April 12th
Summary
Summary
On April 12th at 23:41 UTC, the Nolus Protocol experienced a significant service disruption triggered by an overwhelming number of requests to open or close lease positions. This influx, triggered by a substantial market correction, resulted in over 1,600 lease positions being sent for partial or full liquidations. The surge overwhelmed the relayers, which struggled with a backlog of pending packets across multiple ICA (Interchain Accounts) ordered channels.
Context
In IBC, ICA ordered channels maintain the integrity of transaction sequences across blockchains by ensuring that messages are delivered in the exact order they were sent. If a channel times out — usually due to a lapse in the delivery or acknowledgment of messages within a certain time frame — the protocol mandates a series of steps to re-establish the channel:
- Timeout Detection: The system identifies a timeout when it fails to receive timely acknowledgments for sent packets.
- Channel Closure: The channel is closed to halt any further transactions under the current session.
- Re-Establishment: A new handshake process is initiated by the involved parties to renegotiate channel terms and align channel configurations and sequence expectations across both chains.
- Channel Reopening: Post a successful handshake, the channel progresses from a ‘try’ state to an ‘open’ state, thereby resuming ordered communication.
Root Cause
The disruption stemmed from an accumulation of ChanOpenTry and ChanOpenAck (channel reestablishment) messages in the relayers queue. This backlog led to repeated timeouts and reinitializations of the channel opening process, trapping the relayer in a continuous loop and hindering the processing of other pending messages.
Immediate Remedial Actions
Between April 12th and 13th, the Nolus development team introduced a queue prioritization strategy and integrated a sleep function on the ChanOpenTry and ChanOpenAck messages within a forked version of the relayer software. This intervention effectively broke the loop, cleared the backlog of packets, and restored the protocol to optimal functionality. This patched relayer version will remain in use until a complete migration of all ordered channels to unordered channels takes place, which inherently lack timeouts and remain indefinitely open.
Additionally, the timeout window has been extended to three hours to afford relayers sufficient time to process packets before a timeout occurs.
Future Steps
The dev team is in the midst of upgrading Nolus core to the latest versions of Cosmos SDK (v0.50.0) and IBC-go (v8.1.2). This upgrade will facilitate the migration of all ICA channels to unordered versions.
Following this, we plan to transition to a system architecture that exclusively utilizes a defined set of unordered channels, enhancing the responsiveness and resilience of the cross-chain communication.
This strategic shift is aimed at fortifying our infrastructure against similar disruptions in the future and improving the overall robustness of the Nolus Protocol.
Read next in
Post-Mortems
Post-MortemsPost Mortem Report: Nolus Protocol Service Interruption on March 23th
Summary
2 min read- Post-Mortems
Post-Mortem Analysis: Imperfect Price Feed Impacting Nolus Protocol
Date and Time of Occurrence September 8th, 2023, 08:45 UTC
2 min read
Deep DivesWhen the Weakest Link Isn't the Code
In April 2026, Drift Protocol lost $285M in 12 minutes — not to a smart contract exploit. The attack pattern has shifted from breaking code to compromising control.
6 min read