Classify the failure
Separate a dead grant from a refreshable token or a transient provider error.
Reconnect identity to the exact checkpoint, then continue the original run without repeating committed work.
RECOVERY INCIDENT RECORD
Nightly executive briefing
EXECUTION
run_7f2
IDENTITY
Grant rejected
AADSTS700082
What actually broke
The missing record connects the broken identity to the exact work that stopped.
Separate a dead grant from a refreshable token or a transient provider error.
Join the credential lease, logical run, checkpoint and failed action.
Issue a short-lived recovery capability without exposing credentials to the worker.
Rotate the lease, reconcile side effects and continue from the saved checkpoint.
The recovery contract
Recovery advances the existing execution instead of spawning a replacement job.
Workers holding the rejected credential generation cannot race the resumed run.
The original worker can disappear while recovery remains actionable.
Every mutating action keeps its idempotency key and reconciliation state.
Tokens stay with the identity layer.
Correlates identity, execution and replay evidence.
The runtime handles checkpointing and scheduling.
Use the runtime and credential system already in production. Revive records the contract between them.
1const result = await revive.protectAction({2 runId: workflow.runId,3 checkpointId: workflow.checkpointId,4 connectionId: "conn_microsoft_ops",5 actionKey: "send-briefing",6 credential: () => vault.lease("conn_microsoft_ops"),7 execute: ({ credential, idempotencyKey }) =>8 graph.sendMail(message, { credential, idempotencyKey }),9 reconcile: ({ idempotencyKey }) =>10 graph.findMailByIdempotencyKey(idempotencyKey),11});How Revive fits with the infrastructure already running your workflows.
No. Nango, Auth0 and provider vaults keep token custody. Revive coordinates recovery for the affected run.
Retrying with the rejected credential fails again. Blind replay can also repeat a remote side effect that already committed.
Yes. The recovery case and checkpoint are durable, so another worker can resume the same logical run.
The repository includes LangGraph and Temporal adapters. The contract is designed to support more durable runtimes.
Run the local fault injection and inspect every recovery transition.
Open recovery lab