Skip to content

Playbooks

Playbooks are structured, multi-step remediation workflows that run against a single device. Each playbook defines an ordered sequence of steps — diagnose, act, wait, verify, rollback — that execute tools on the target device and evaluate the results. Playbooks codify your standard operating procedures so that incident response is consistent, auditable, and repeatable regardless of who triggers it.

Breeze ships with built-in playbooks for common scenarios (disk cleanup, service restart, memory pressure relief). Organizations can also create custom playbooks scoped to their own environment. Playbooks can be triggered manually by administrators, automatically by the AI assistant in response to alerts, or programmatically through the API.

Every execution is recorded with per-step timing, tool input/output, and pass/fail status. If a step fails, the playbook’s failure policy determines whether execution stops, continues, or triggers a rollback sequence.


Playbooks can be triggered in several ways from the Breeze dashboard.

The most common way to run a playbook is through the AI assistant during an investigation. When the AI identifies a remediation opportunity — for example, a device with high disk usage — it can suggest and execute the appropriate playbook. The AI handles variable substitution and monitors each step as it executes.

  1. Navigate to the device. Open the device detail page for the target device.

  2. Open Playbook History. Scroll to the Playbook History section on the device page. This shows all past playbook executions for the device.

  3. Trigger a playbook. The AI assistant or an alert-triggered automation initiates playbook execution on the device. You can also use the API to execute a specific playbook against the device (see API Reference).

When executing a playbook, you supply runtime variables that are substituted into step inputs. For example, the Service Restart playbook requires a serviceName variable to know which service to restart. The AI assistant fills these in automatically based on context; when triggering via API, pass them in the variables field.


Every playbook execution is recorded and visible from the device’s Playbook History section.

Each execution row shows:

  • Playbook name and category — with a color-coded category badge (disk, service, memory, patch, security).
  • Status — Pending, Running, Waiting, Completed, Failed, Rolled Back, or Cancelled, each with a distinct icon and color.
  • Trigger source — who or what initiated the run (AI, manual, alert).
  • Duration — total execution time from start to completion.

Click an execution to expand it and view:

  • Step-by-step results — each step is listed with its index number, name, tool used, duration, and status (Done, Failed, Skipped, Pending, Running). Failed steps highlight in red for quick identification.
  • Error details — if the execution failed, the error message is displayed in a prominent banner.
  • Rollback indicator — if a rollback was triggered, a notice appears confirming it was executed.

Use the Refresh button to update the list while executions are still in progress.


Each step in a playbook has a type that determines its role in the workflow:

TypePurpose
diagnoseCollect baseline data before remediation. Runs a tool and captures current state for later comparison.
actPerform a remediation action. Runs a tool that modifies the device (restart a service, delete files, etc.).
waitPause execution for a specified number of seconds. Allows the system to stabilize before verification.
verifyCheck that the remediation achieved the desired result. Evaluates a condition against tool output.
rollbackUndo changes if verification fails. Only executed when the failure policy is set to rollback.
StatusMeaning
pendingExecution record created but no steps have started
runningAt least one step is actively executing
waitingExecution is paused on a wait step
completedAll steps finished successfully and verification passed
failedA step failed and the failure policy stopped execution
rolled_backA step failed and the rollback sequence was executed
cancelledExecution was cancelled by a user or system before completion
StatusMeaning
pendingStep has not started yet
runningStep is currently executing
completedStep finished successfully
failedStep encountered an error
skippedStep was bypassed (e.g., remaining steps after a failure with stop policy)

Each step can define an onFailure behavior that controls what happens when it fails:

PolicyBehavior
stopAbort the playbook immediately. Remaining steps are marked skipped. This is the default.
continueLog the failure and proceed to the next step.
rollbackExecute any rollback-type steps in reverse order, then mark the execution as rolled_back.

Breeze includes three built-in playbooks that are available to all organizations out of the box. They are updated automatically with each release.

Category: disk Required permissions: devices:read, devices:execute

A four-phase workflow that frees disk space safely:

  1. Capture baseline disk usage — Runs the analyze_disk_usage tool to collect current disk utilization and identify cleanup candidates.

  2. Preview safe cleanup candidates — Runs disk_cleanup in preview mode against safe categories: temporary files, browser cache, package cache, and trash. No files are deleted in this step.

  3. Execute cleanup — Runs disk_cleanup in execute mode, deleting the files identified in the preview step.

  4. Wait for filesystem metrics — Pauses for 30 seconds to allow disk metrics to refresh.

  5. Verify disk usage improved — Runs analyze_disk_usage again and checks that disk_usage_percent is below 90%. If verification fails, execution stops.

Category: service Required permissions: devices:read, devices:execute Trigger conditions: Can be linked to service_down alerts (auto-execute is disabled by default)

  1. Check current service status — Uses manage_services to read the service state before remediation.

  2. Restart target service — Restarts the service specified by the {{serviceName}} variable.

  3. Wait for service startup — Pauses for 10 seconds to allow the service to initialize.

  4. Verify service health — Checks that service_status equals running. If the service is not running after restart, execution stops.

Category: memory Required permissions: devices:read, devices:execute

  1. Capture baseline memory metrics — Runs analyze_metrics to check RAM utilization over the last hour.

  2. Restart memory-heavy service — Restarts the service specified by {{serviceName}} to release held memory.

  3. Wait for memory stabilization — Pauses for 300 seconds (5 minutes) to allow memory metrics to settle.

  4. Verify memory improved — Checks that ram_usage_percent is below 85%. If memory usage is still elevated, execution stops.


Organizations can create custom playbooks scoped to their own environment. Custom playbooks have an orgId set to the creating organization and are only visible to users with access to that organization.

Custom playbooks are created through the API. Each playbook definition includes:

FieldTypeRequiredDescription
namestringYesHuman-readable playbook name (max 255 characters)
descriptiontextYesDetailed description of what the playbook does
stepsPlaybookStep[]YesOrdered array of step definitions
triggerConditionsobjectNoConditions under which the playbook can be auto-triggered
categorystringNoGrouping category: disk, service, memory, patch, or security
requiredPermissionsstring[]NoPermissions the executing user must have (defaults to [])
isActivebooleanNoWhether the playbook is available for execution (defaults to true)

Trigger conditions control when a playbook can be automatically activated:

FieldTypeDescription
alertTypesstring[]Alert types that can trigger this playbook (e.g., ["service_down", "high_cpu"])
deviceTagsstring[]Only trigger for devices with these tags
autoExecutebooleanIf true, execute automatically when conditions match. If false, require manual confirmation.
minSeveritystringMinimum alert severity to trigger: low, medium, high, or critical

Each step in a playbook is defined as a JSON object with the following fields:

FieldTypeRequiredDescription
typestringYesStep type: diagnose, act, wait, verify, or rollback
namestringYesShort name displayed in the execution log
descriptionstringYesDetailed explanation of what this step does
toolstringNoName of the tool to execute (not required for wait steps)
toolInputobjectNoKey-value pairs passed to the tool. Supports {{variable}} template placeholders.
waitSecondsnumberNoNumber of seconds to pause (only used by wait steps)
verifyConditionobjectNoCondition to evaluate after tool execution (used by verify steps)
onFailurestringNoFailure behavior for this step: stop, continue, or rollback

Verify steps evaluate a condition against the tool output to determine success:

FieldTypeDescription
metricstringThe metric name to evaluate from the tool’s output
operatorstringComparison operator: lt (less than), gt (greater than), eq (equal), ne (not equal), contains
valueanyThe expected value to compare against

Example: Verify disk usage is below 90%:

{
"type": "verify",
"name": "Verify disk usage",
"description": "Confirm disk usage dropped below threshold",
"tool": "analyze_disk_usage",
"toolInput": { "deviceId": "{{deviceId}}", "refresh": true },
"verifyCondition": {
"metric": "disk_usage_percent",
"operator": "lt",
"value": 90
},
"onFailure": "stop"
}

Playbooks can be activated through several mechanisms:

Any user with the required permissions can execute a playbook on a specific device through the API:

Terminal window
POST /playbooks/:playbookId/execute
Content-Type: application/json
Authorization: Bearer <token>
{
"deviceId": "device-uuid",
"variables": {
"serviceName": "nginx"
}
}

The variables object provides runtime values for {{variable}} placeholders in step toolInput fields. The context field can pass additional metadata such as an alertId or conversationId for traceability.

The Breeze AI assistant can execute playbooks as part of an automated incident response conversation. When the AI identifies a matching playbook for a detected issue, it calls the execute endpoint with triggeredBy: "ai" and includes the conversationId in the execution context. The AI then monitors execution progress and updates step results as each step completes.

Playbooks with triggerConditions.alertTypes configured can respond to matching alerts. When an alert fires and its type matches a playbook’s trigger conditions:

  • If autoExecute is true, the playbook runs immediately on the affected device.
  • If autoExecute is false, the playbook is suggested to the operator but requires manual confirmation.

Additional filtering applies: deviceTags restricts matching to devices with specific tags, and minSeverity sets the minimum alert severity that qualifies.


Every playbook run creates an execution record. Execution records include:

FieldDescription
idUnique execution UUID
orgIdOrganization the execution belongs to
deviceIdTarget device UUID
playbookIdPlaybook definition UUID
statusCurrent execution status
currentStepIndexIndex of the step currently executing (0-based)
stepsArray of per-step results with timing, tool output, and status
contextExecution context including alertId, conversationId, and variables
errorMessageError description if the execution failed
rollbackExecutedWhether a rollback sequence was triggered
triggeredByHow the execution was initiated (e.g., ai, manual, alert)
triggeredByUserIdUUID of the user who triggered the execution (if applicable)
startedAtTimestamp when the first step began
completedAtTimestamp when the execution finished

List all executions with optional filters:

GET /playbooks/executions?deviceId=&playbookId=&status=&limit=50

Get full detail for a single execution, including the playbook definition and device information:

GET /playbooks/executions/:executionId

The response includes the complete steps array with per-step results:

{
"execution": {
"id": "exec-uuid",
"status": "completed",
"currentStepIndex": 4,
"steps": [
{
"stepIndex": 0,
"stepName": "Capture baseline disk usage",
"status": "completed",
"toolUsed": "analyze_disk_usage",
"toolInput": { "deviceId": "device-uuid", "refresh": true },
"toolOutput": "Disk usage: 94.2%...",
"startedAt": "2026-02-23T10:00:00.000Z",
"completedAt": "2026-02-23T10:00:03.500Z",
"durationMs": 3500
}
]
},
"playbook": { "id": "...", "name": "Disk Cleanup", "category": "disk" },
"device": { "id": "...", "hostname": "web-server-01" }
}

As each step completes, the execution record is updated via PATCH:

Terminal window
PATCH /playbooks/executions/:executionId
Content-Type: application/json
Authorization: Bearer <token>
{
"status": "running",
"currentStepIndex": 2,
"steps": [
{
"stepIndex": 0,
"stepName": "Capture baseline",
"status": "completed",
"toolUsed": "analyze_disk_usage",
"durationMs": 3500
}
]
}

Execution status transitions are strictly validated. The allowed transitions are:

FromAllowed transitions
pendingrunning, cancelled
runningwaiting, completed, failed, rolled_back, cancelled
waitingrunning, completed, failed, rolled_back, cancelled
completed(terminal — no transitions)
failed(terminal — no transitions)
rolled_back(terminal — no transitions)
cancelled(terminal — no transitions)

All playbook endpoints require authentication and scope-based authorization. Organization-scope users see their own organization’s playbooks plus all built-in playbooks. Partner-scope users see playbooks across their managed organizations. System-scope users see everything.

MethodPathDescription
GET/playbooksList active playbooks (?category=disk|service|memory|patch|security|all)
GET/playbooks/:idGet a single playbook definition by ID
POST/playbooks/:id/executeExecute a playbook on a device
GET/playbooks/executionsList execution history (?deviceId=&playbookId=&status=&limit=)
GET/playbooks/executions/:idGet full execution detail with playbook and device info
PATCH/playbooks/executions/:idUpdate execution progress (status, steps, context)

Request body:

FieldTypeRequiredDescription
deviceIdUUIDYesTarget device to run the playbook against
variablesobjectNoRuntime values for {{variable}} placeholders in step toolInput
contextobjectNoAdditional context (alertId, conversationId, userInput)

Response: Returns the created execution record, the playbook definition (including steps), and the target device.

Error responses:

StatusReason
404Playbook not found, not active, or access denied
403User lacks required permissions defined by the playbook
404Device not found or access denied
403Playbook and device belong to different organizations
409Referenced resource was deleted concurrently

Request body (all fields optional, at least one required):

FieldTypeDescription
statusstringNew execution status (must be a valid transition)
currentStepIndexnumberIndex of the currently executing step
stepsarrayUpdated step results array
contextobjectUpdated execution context
errorMessagestring or nullError message (set on failure, clear on recovery)
rollbackExecutedbooleanWhether rollback was triggered
startedAtstring or nullISO 8601 timestamp
completedAtstring or nullISO 8601 timestamp

Playbooks integrate with the Incident Response system. When a playbook is executed in the context of an active incident, the execution record includes the incidentId in its context, creating a direct link between the remediation workflow and the incident timeline.

The AI assistant can trigger playbooks during incident response conversations. For example, when investigating a compromised device, the AI might execute the Service Restart playbook to restart a suspicious service, then record the result as evidence on the incident.

Create custom playbooks with category: "security" for incident-specific workflows:

  • Endpoint isolation — disable network interfaces, block USB, kill suspicious processes
  • Evidence collection — gather logs, running processes, network connections, and screenshots
  • Service recovery — restart affected services after containment, verify health

Security playbooks can be linked to alert trigger conditions with alertTypes: ["security_threat"] so they execute automatically (or with confirmation) when a security alert fires.


Playbook not appearing in the list. Verify the playbook’s isActive field is true. The GET /playbooks endpoint only returns active playbooks. Also confirm the authenticated user has access to the playbook’s organization, or that the playbook is a built-in (built-in playbooks are visible to all organizations).

Execution fails with “Missing required permissions”. The playbook definition specifies requiredPermissions that the executing user must have. Check the missingPermissions array in the 403 response to identify which permissions are needed. Built-in playbooks require devices:read and devices:execute.

Execution stuck in pending status. The execute endpoint creates the execution record but does not run steps automatically. The AI assistant or the calling system is responsible for driving execution by invoking each step’s tool and updating the execution via PATCH. If no external caller advances the execution, it remains in pending.

PATCH returns 409 Conflict. The execution was modified by another process between your read and write. Re-fetch the execution via GET /playbooks/executions/:id, merge your changes with the current state, and retry the PATCH.

Invalid status transition error. Execution status changes are validated against an allowed-transitions table. Terminal statuses (completed, failed, rolled_back, cancelled) cannot transition to any other status. Check the current execution status before attempting a PATCH.

Verify step fails but remediation worked. The verification condition may be too strict, or the wait step before verification may not allow enough time for metrics to settle. Increase waitSeconds on the preceding wait step, or adjust the verifyCondition threshold. For disk cleanup, 30 seconds is usually sufficient; for memory relief, 300 seconds (5 minutes) is recommended.

Template variables not substituted. Ensure the variable names in toolInput match the keys passed in the variables field of the execute request. Template placeholders use the format {{variableName}} and are case-sensitive.