Monitor and Manage an OpRun#

You have created an OpRun. Now you need to follow its progress, retrieve the outputs when it finishes, or intervene if something goes wrong.

OpRun lifecycle#

For a conceptual overview of OpRuns and their state machine, see Lifecycle.

Every OpRun goes through a well-defined set of states:

PENDING → ASSIGNED → RUNNING → COMPLETED
                             → FAILED
                             → CANCELLED

PENDING — the run is waiting in the queue for a worker.

ASSIGNED — the scheduler picked a worker but execution hasn't started yet.

RUNNING — the worker started the container.

COMPLETED — the container finished successfully and outputs are available.

FAILED — something went wrong during execution.

CANCELLED — the run was cancelled before it could finish.

COMPLETED, FAILED, and CANCELLED are terminal states. A run in a terminal state will not change again unless you explicitly retry it.

Get run info#

You can fetch the full run object at any time to get the current state, progress, outputs, events, etc:

GET /runs/{id}

curl http://gyoza-server:5555/runs/a1b2c3d4-... \
  -H "X-API-Key: $GYOZA_API_KEY"

The response includes the current state, progress (0–100), outputs (populated after completion), and the events timeline. While the run is still executing, the response looks like this:

{
  "id": "a1b2c3d4-...",
  "state": "RUNNING",
  "progress": 42,
  "current_attempt": 1,
  "inputs": {"image_path": "/remote/path/photo.jpg", "top_k": 3},
  "outputs": {},
  "events": [
    {"id": 0, "type": "STARTED",  "t": "2026-03-02T20:00:01Z", "msg": "started", "state": "RUNNING"},
    {"id": 1, "type": "PROGRESS", "t": "2026-03-02T20:00:05Z", "msg": 42,        "state": "RUNNING"}
  ]
}

Once the run reaches COMPLETED, the outputs field is populated with the op’s result:

{
  "id": "a1b2c3d4-...",
  "state": "COMPLETED",
  "progress": 100,
  "outputs": {
    "label": "cat",
    "confidence": 0.97
  },
  "events": [
    {"id": 0, "type": "STARTED",   "t": "2026-03-02T20:00:01Z", "msg": "started", "state": "RUNNING"},
    {"id": 1, "type": "PROGRESS",  "t": "2026-03-02T20:00:05Z", "msg": 42,        "state": "RUNNING"},
    {"id": 2, "type": "PROGRESS",  "t": "2026-03-02T20:00:10Z", "msg": 100,       "state": "RUNNING"},
    {"id": 3, "type": "COMPLETED", "t": "2026-03-02T20:00:12Z", "msg": "done",    "state": "COMPLETED"}
  ]
}

Monitor progress via polling#

Calling GET /runs/{id} returns all events at once, which is fine for a quick check. For long-running ops where you want to follow progress in near real-time without fetching the full history every time, you should use the events endpoint:

GET /runs/{id}/events?after={last_event_id}

The after parameter acts as a cursor, the server only returns events with an id greater than the value you provide. If you omit it you will get all events from the beginning.

In the following example, we show how to get the initial events and then how to get the subsequent events using the after parameter. First, run a initial request to get all events and obtain your cursor id:

curl "http://gyoza-server:5555/runs/a1b2c3d4-.../events" \
  -H "X-API-Key: $GYOZA_API_KEY"

The response will be a list of events with their ids:

{
  "events": [
    {"id": 0, "type": "STARTED",  "t": "2026-03-02T20:00:01Z", "msg": "started", "state": "RUNNING"},
    {"id": 1, "type": "PROGRESS", "t": "2026-03-02T20:00:05Z", "msg": 42,        "state": "RUNNING"}
  ]
}

The last event has "id": 1, so you store that as your cursor id.

Then, you can pass the cursor id to the after parameter to get only new events since the last request:

curl "http://gyoza-server:5555/runs/a1b2c3d4-.../events?after=1" \
  -H "X-API-Key: $GYOZA_API_KEY"

{
  "events": [
    {"id": 2, "type": "PROGRESS",  "t": "2026-03-02T20:00:10Z", "msg": 100,    "state": "RUNNING"},
    {"id": 3, "type": "COMPLETED", "t": "2026-03-02T20:00:12Z", "msg": "done",  "state": "COMPLETED"}
  ]
}

Cancel a run#

Warning

Cancelling a run is not supported yet.

Retry a run#

If a run is in a terminal state, you can retry it:

curl -X POST http://gyoza-server:5555/runs/<run-id>/retry \
  -H "X-API-Key: $GYOZA_API_KEY"

This creates a new OpAttempt under the same run ID, increments the attempt counter, and puts the run back in PENDING. An attempt preserves the full history of a run try (events, outputs, execution summary, etc), so you still have access to the full history of all attempts of the run.

You can inspect all attempts for a run:

GET /runs/{id}/attempts

For our example, we can get the attempts for the run with:

curl http://gyoza-server:5555/runs/a1b2c3d4-.../attempts \
  -H "X-API-Key: $GYOZA_API_KEY"

Here is an example response for a run that failed on the first attempt and succeeded on the retry:

[
  {
    "attempt": 1,
    "state": "FAILED",
    "progress": 30,
    "outputs": {},
    "execution_summary": {"error_message": "out of memory", ...},
    "events": [
      {"t": "2026-03-02T20:00:01", "type": "STARTED",  "msg": "started"},
      {"t": "2026-03-02T20:00:05", "type": "PROGRESS", "msg": 30},
      {"t": "2026-03-02T20:00:06", "type": "FAILED",   "msg": "out of memory"}
    ],
    ...
  },
  {
    "attempt": 2,
    "state": "COMPLETED",
    "progress": 100,
    "outputs": {"label": "cat", "confidence": 0.97},
    "execution_summary": {"duration_ms": 13000, ...},
    "events": [
      {"t": "2026-03-02T20:00:12", "type": "STARTED",   "msg": "started"},
      {"t": "2026-03-02T20:00:18", "type": "PROGRESS",  "msg": 50},
      {"t": "2026-03-02T20:00:24", "type": "PROGRESS",  "msg": 100},
      {"t": "2026-03-02T20:00:25", "type": "COMPLETED", "msg": "done"}
    ],
    ...
  }
]