Durable workflows for the on-chain era
Why "fire and forget" doesn't survive contact with a chain reorg, and what we built instead.
Every web3 backend eventually meets the same three demons. The RPC node lies, returning a transaction receipt that quietly disappears twelve blocks later when the chain decides it never happened. The handler crashes mid-sequence, after you've debited the user and queued the on-chain transfer but before the worker has finished, leaving a half-finished intent rotting in a Redis stream where nobody will find it until the customer emails support. And the chain reorgs, because three confirmations was never quite enough, and it rarely is.
Traditional task queues like Sidekiq, Celery, and BullMQ assume that retries are free and idempotency is the caller's problem. That assumption falls apart the moment your workflow touches a chain, because retries cost real money in gas, idempotency has to extend across a finality boundary you can't observe synchronously, and the caller can't be trusted with state that has to survive a reorg in the middle of the night.
What we wanted
We had a few non-negotiables when we started building Chainflo. Workflows had to be code, not config, with branching logic and loops and retries written in TypeScript rather than buried in a YAML schema we'd grow to hate. Every step had to be resumable, so that if the worker dies after step 5 of 12, the next worker picks up at step 6 with the same arguments and the same prior results in scope, exactly as if nothing had happened. The whole thing had to be reorg-aware, rolling back dependent branches automatically when a finality assumption is invalidated. And the on-chain submission step had to be exactly-once at the edge, because we never wanted to double-spend, even under the worst possible interleaving of crashes, redeploys, and partial network failures.
How Chainflo.ws does it
Each workflow is a durable function. Steps are recorded as they complete, and the function is replayed deterministically on resume, fast-forwarded through already-completed steps, only re-executing the one that was suspended.
export const onPaymentReceived = workflow(async (ctx, event) => {
const user = await ctx.step('lookup-user', () =>
db.users.find(event.userId)
)
const tx = await ctx.step('mint-tokens', {
idempotencyKey: event.id,
finality: { confirmations: 12 },
}, () =>
chain.mint(user.wallet, event.amount)
)
await ctx.step('credit-ledger', () =>
ledger.post({ user: user.id, ref: tx.hash, amount: event.amount })
)
})
The step boundaries are the durability boundaries. Crash, restart, or redeploy, and the workflow resumes mid-flight with the same state, the same arguments, and the same partial results, because the durability layer doesn't care which process is executing it.
What we're working on next
We mostly speak EVM today, so native Bitcoin support is the obvious next chain to add, and it's where most of our current effort sits. Alongside that, we're working on a lower-friction local dev story (because nothing kills adoption faster than a half-day docker setup), and time-travel debugging that lets you replay any historical run with a different step implementation, which has quietly become the most-requested feature from teams already using us in production.
If you have an in-flight payment system that scares you, we should talk.