Playbooks
Playbooks are structured, multi-step remediation workflows that run against a single device. Each playbook defines an ordered sequence of steps — diagnose, act, wait, verify, rollback — that execute tools on the target device and evaluate the results. Playbooks codify your standard operating procedures so that incident response is consistent, auditable, and repeatable regardless of who triggers it.
Breeze ships with built-in playbooks for common scenarios (disk cleanup, service restart, memory pressure relief). Organizations can also create custom playbooks scoped to their own environment. Playbooks can be triggered manually by administrators, automatically by the AI assistant in response to alerts, or programmatically through the API.
Every execution is recorded with per-step timing, tool input/output, and pass/fail status. If a step fails, the playbook’s failure policy determines whether execution stops, continues, or triggers a rollback sequence.
Running a Playbook
Section titled “Running a Playbook”Playbooks can be triggered in several ways from the Breeze dashboard.
From the AI Assistant
Section titled “From the AI Assistant”The most common way to run a playbook is through the AI assistant during an investigation. When the AI identifies a remediation opportunity — for example, a device with high disk usage — it can suggest and execute the appropriate playbook. The AI handles variable substitution and monitors each step as it executes.
From the Device Detail Page
Section titled “From the Device Detail Page”-
Navigate to the device. Open the device detail page for the target device.
-
Open Playbook History. Scroll to the Playbook History section on the device page. This shows all past playbook executions for the device.
-
Trigger a playbook. The AI assistant or an alert-triggered automation initiates playbook execution on the device. You can also use the API to execute a specific playbook against the device (see API Reference).
Variables
Section titled “Variables”When executing a playbook, you supply runtime variables that are substituted into step inputs. For example, the Service Restart playbook requires a serviceName variable to know which service to restart. The AI assistant fills these in automatically based on context; when triggering via API, pass them in the variables field.
Viewing Results
Section titled “Viewing Results”Every playbook execution is recorded and visible from the device’s Playbook History section.
Each execution row shows:
- Playbook name and category — with a color-coded category badge (disk, service, memory, patch, security).
- Status — Pending, Running, Waiting, Completed, Failed, Rolled Back, or Cancelled, each with a distinct icon and color.
- Trigger source — who or what initiated the run (AI, manual, alert).
- Duration — total execution time from start to completion.
Click an execution to expand it and view:
- Step-by-step results — each step is listed with its index number, name, tool used, duration, and status (Done, Failed, Skipped, Pending, Running). Failed steps highlight in red for quick identification.
- Error details — if the execution failed, the error message is displayed in a prominent banner.
- Rollback indicator — if a rollback was triggered, a notice appears confirming it was executed.
Use the Refresh button to update the list while executions are still in progress.
Key Concepts
Section titled “Key Concepts”Step Types
Section titled “Step Types”Each step in a playbook has a type that determines its role in the workflow:
| Type | Purpose |
|---|---|
diagnose | Collect baseline data before remediation. Runs a tool and captures current state for later comparison. |
act | Perform a remediation action. Runs a tool that modifies the device (restart a service, delete files, etc.). |
wait | Pause execution for a specified number of seconds. Allows the system to stabilize before verification. |
verify | Check that the remediation achieved the desired result. Evaluates a condition against tool output. |
rollback | Undo changes if verification fails. Only executed when the failure policy is set to rollback. |
Execution Statuses
Section titled “Execution Statuses”| Status | Meaning |
|---|---|
pending | Execution record created but no steps have started |
running | At least one step is actively executing |
waiting | Execution is paused on a wait step |
completed | All steps finished successfully and verification passed |
failed | A step failed and the failure policy stopped execution |
rolled_back | A step failed and the rollback sequence was executed |
cancelled | Execution was cancelled by a user or system before completion |
Step Result Statuses
Section titled “Step Result Statuses”| Status | Meaning |
|---|---|
pending | Step has not started yet |
running | Step is currently executing |
completed | Step finished successfully |
failed | Step encountered an error |
skipped | Step was bypassed (e.g., remaining steps after a failure with stop policy) |
Failure Policies
Section titled “Failure Policies”Each step can define an onFailure behavior that controls what happens when it fails:
| Policy | Behavior |
|---|---|
stop | Abort the playbook immediately. Remaining steps are marked skipped. This is the default. |
continue | Log the failure and proceed to the next step. |
rollback | Execute any rollback-type steps in reverse order, then mark the execution as rolled_back. |
Built-in Playbooks
Section titled “Built-in Playbooks”Breeze includes three built-in playbooks that are available to all organizations out of the box. They are updated automatically with each release.
Disk Cleanup
Section titled “Disk Cleanup”Category: disk
Required permissions: devices:read, devices:execute
A four-phase workflow that frees disk space safely:
-
Capture baseline disk usage — Runs the
analyze_disk_usagetool to collect current disk utilization and identify cleanup candidates. -
Preview safe cleanup candidates — Runs
disk_cleanupin preview mode against safe categories: temporary files, browser cache, package cache, and trash. No files are deleted in this step. -
Execute cleanup — Runs
disk_cleanupin execute mode, deleting the files identified in the preview step. -
Wait for filesystem metrics — Pauses for 30 seconds to allow disk metrics to refresh.
-
Verify disk usage improved — Runs
analyze_disk_usageagain and checks thatdisk_usage_percentis below 90%. If verification fails, execution stops.
Service Restart with Health Check
Section titled “Service Restart with Health Check”Category: service
Required permissions: devices:read, devices:execute
Trigger conditions: Can be linked to service_down alerts (auto-execute is disabled by default)
-
Check current service status — Uses
manage_servicesto read the service state before remediation. -
Restart target service — Restarts the service specified by the
{{serviceName}}variable. -
Wait for service startup — Pauses for 10 seconds to allow the service to initialize.
-
Verify service health — Checks that
service_statusequalsrunning. If the service is not running after restart, execution stops.
Memory Pressure Relief
Section titled “Memory Pressure Relief”Category: memory
Required permissions: devices:read, devices:execute
-
Capture baseline memory metrics — Runs
analyze_metricsto check RAM utilization over the last hour. -
Restart memory-heavy service — Restarts the service specified by
{{serviceName}}to release held memory. -
Wait for memory stabilization — Pauses for 300 seconds (5 minutes) to allow memory metrics to settle.
-
Verify memory improved — Checks that
ram_usage_percentis below 85%. If memory usage is still elevated, execution stops.
Custom Playbooks
Section titled “Custom Playbooks”Organizations can create custom playbooks scoped to their own environment. Custom playbooks have an orgId set to the creating organization and are only visible to users with access to that organization.
Creating a custom playbook
Section titled “Creating a custom playbook”Custom playbooks are created through the API. Each playbook definition includes:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Human-readable playbook name (max 255 characters) |
description | text | Yes | Detailed description of what the playbook does |
steps | PlaybookStep[] | Yes | Ordered array of step definitions |
triggerConditions | object | No | Conditions under which the playbook can be auto-triggered |
category | string | No | Grouping category: disk, service, memory, patch, or security |
requiredPermissions | string[] | No | Permissions the executing user must have (defaults to []) |
isActive | boolean | No | Whether the playbook is available for execution (defaults to true) |
Trigger Conditions
Section titled “Trigger Conditions”Trigger conditions control when a playbook can be automatically activated:
| Field | Type | Description |
|---|---|---|
alertTypes | string[] | Alert types that can trigger this playbook (e.g., ["service_down", "high_cpu"]) |
deviceTags | string[] | Only trigger for devices with these tags |
autoExecute | boolean | If true, execute automatically when conditions match. If false, require manual confirmation. |
minSeverity | string | Minimum alert severity to trigger: low, medium, high, or critical |
Playbook Steps
Section titled “Playbook Steps”Each step in a playbook is defined as a JSON object with the following fields:
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Step type: diagnose, act, wait, verify, or rollback |
name | string | Yes | Short name displayed in the execution log |
description | string | Yes | Detailed explanation of what this step does |
tool | string | No | Name of the tool to execute (not required for wait steps) |
toolInput | object | No | Key-value pairs passed to the tool. Supports {{variable}} template placeholders. |
waitSeconds | number | No | Number of seconds to pause (only used by wait steps) |
verifyCondition | object | No | Condition to evaluate after tool execution (used by verify steps) |
onFailure | string | No | Failure behavior for this step: stop, continue, or rollback |
Verification Conditions
Section titled “Verification Conditions”Verify steps evaluate a condition against the tool output to determine success:
| Field | Type | Description |
|---|---|---|
metric | string | The metric name to evaluate from the tool’s output |
operator | string | Comparison operator: lt (less than), gt (greater than), eq (equal), ne (not equal), contains |
value | any | The expected value to compare against |
Example: Verify disk usage is below 90%:
{ "type": "verify", "name": "Verify disk usage", "description": "Confirm disk usage dropped below threshold", "tool": "analyze_disk_usage", "toolInput": { "deviceId": "{{deviceId}}", "refresh": true }, "verifyCondition": { "metric": "disk_usage_percent", "operator": "lt", "value": 90 }, "onFailure": "stop"}Triggers
Section titled “Triggers”Playbooks can be activated through several mechanisms:
Manual execution
Section titled “Manual execution”Any user with the required permissions can execute a playbook on a specific device through the API:
POST /playbooks/:playbookId/executeContent-Type: application/jsonAuthorization: Bearer <token>
{ "deviceId": "device-uuid", "variables": { "serviceName": "nginx" }}The variables object provides runtime values for {{variable}} placeholders in step toolInput fields. The context field can pass additional metadata such as an alertId or conversationId for traceability.
AI-triggered execution
Section titled “AI-triggered execution”The Breeze AI assistant can execute playbooks as part of an automated incident response conversation. When the AI identifies a matching playbook for a detected issue, it calls the execute endpoint with triggeredBy: "ai" and includes the conversationId in the execution context. The AI then monitors execution progress and updates step results as each step completes.
Alert-triggered execution
Section titled “Alert-triggered execution”Playbooks with triggerConditions.alertTypes configured can respond to matching alerts. When an alert fires and its type matches a playbook’s trigger conditions:
- If
autoExecuteistrue, the playbook runs immediately on the affected device. - If
autoExecuteisfalse, the playbook is suggested to the operator but requires manual confirmation.
Additional filtering applies: deviceTags restricts matching to devices with specific tags, and minSeverity sets the minimum alert severity that qualifies.
Execution History
Section titled “Execution History”Every playbook run creates an execution record. Execution records include:
| Field | Description |
|---|---|
id | Unique execution UUID |
orgId | Organization the execution belongs to |
deviceId | Target device UUID |
playbookId | Playbook definition UUID |
status | Current execution status |
currentStepIndex | Index of the step currently executing (0-based) |
steps | Array of per-step results with timing, tool output, and status |
context | Execution context including alertId, conversationId, and variables |
errorMessage | Error description if the execution failed |
rollbackExecuted | Whether a rollback sequence was triggered |
triggeredBy | How the execution was initiated (e.g., ai, manual, alert) |
triggeredByUserId | UUID of the user who triggered the execution (if applicable) |
startedAt | Timestamp when the first step began |
completedAt | Timestamp when the execution finished |
Viewing execution results
Section titled “Viewing execution results”List all executions with optional filters:
GET /playbooks/executions?deviceId=&playbookId=&status=&limit=50Get full detail for a single execution, including the playbook definition and device information:
GET /playbooks/executions/:executionIdThe response includes the complete steps array with per-step results:
{ "execution": { "id": "exec-uuid", "status": "completed", "currentStepIndex": 4, "steps": [ { "stepIndex": 0, "stepName": "Capture baseline disk usage", "status": "completed", "toolUsed": "analyze_disk_usage", "toolInput": { "deviceId": "device-uuid", "refresh": true }, "toolOutput": "Disk usage: 94.2%...", "startedAt": "2026-02-23T10:00:00.000Z", "completedAt": "2026-02-23T10:00:03.500Z", "durationMs": 3500 } ] }, "playbook": { "id": "...", "name": "Disk Cleanup", "category": "disk" }, "device": { "id": "...", "hostname": "web-server-01" }}Updating execution progress
Section titled “Updating execution progress”As each step completes, the execution record is updated via PATCH:
PATCH /playbooks/executions/:executionIdContent-Type: application/jsonAuthorization: Bearer <token>
{ "status": "running", "currentStepIndex": 2, "steps": [ { "stepIndex": 0, "stepName": "Capture baseline", "status": "completed", "toolUsed": "analyze_disk_usage", "durationMs": 3500 } ]}Status transitions
Section titled “Status transitions”Execution status transitions are strictly validated. The allowed transitions are:
| From | Allowed transitions |
|---|---|
pending | running, cancelled |
running | waiting, completed, failed, rolled_back, cancelled |
waiting | running, completed, failed, rolled_back, cancelled |
completed | (terminal — no transitions) |
failed | (terminal — no transitions) |
rolled_back | (terminal — no transitions) |
cancelled | (terminal — no transitions) |
API Reference
Section titled “API Reference”All playbook endpoints require authentication and scope-based authorization. Organization-scope users see their own organization’s playbooks plus all built-in playbooks. Partner-scope users see playbooks across their managed organizations. System-scope users see everything.
| Method | Path | Description |
|---|---|---|
| GET | /playbooks | List active playbooks (?category=disk|service|memory|patch|security|all) |
| GET | /playbooks/:id | Get a single playbook definition by ID |
| POST | /playbooks/:id/execute | Execute a playbook on a device |
| GET | /playbooks/executions | List execution history (?deviceId=&playbookId=&status=&limit=) |
| GET | /playbooks/executions/:id | Get full execution detail with playbook and device info |
| PATCH | /playbooks/executions/:id | Update execution progress (status, steps, context) |
POST /playbooks/:id/execute
Section titled “POST /playbooks/:id/execute”Request body:
| Field | Type | Required | Description |
|---|---|---|---|
deviceId | UUID | Yes | Target device to run the playbook against |
variables | object | No | Runtime values for {{variable}} placeholders in step toolInput |
context | object | No | Additional context (alertId, conversationId, userInput) |
Response: Returns the created execution record, the playbook definition (including steps), and the target device.
Error responses:
| Status | Reason |
|---|---|
| 404 | Playbook not found, not active, or access denied |
| 403 | User lacks required permissions defined by the playbook |
| 404 | Device not found or access denied |
| 403 | Playbook and device belong to different organizations |
| 409 | Referenced resource was deleted concurrently |
PATCH /playbooks/executions/:id
Section titled “PATCH /playbooks/executions/:id”Request body (all fields optional, at least one required):
| Field | Type | Description |
|---|---|---|
status | string | New execution status (must be a valid transition) |
currentStepIndex | number | Index of the currently executing step |
steps | array | Updated step results array |
context | object | Updated execution context |
errorMessage | string or null | Error message (set on failure, clear on recovery) |
rollbackExecuted | boolean | Whether rollback was triggered |
startedAt | string or null | ISO 8601 timestamp |
completedAt | string or null | ISO 8601 timestamp |
Incident Response Integration
Section titled “Incident Response Integration”Playbooks integrate with the Incident Response system. When a playbook is executed in the context of an active incident, the execution record includes the incidentId in its context, creating a direct link between the remediation workflow and the incident timeline.
The AI assistant can trigger playbooks during incident response conversations. For example, when investigating a compromised device, the AI might execute the Service Restart playbook to restart a suspicious service, then record the result as evidence on the incident.
Security Playbooks
Section titled “Security Playbooks”Create custom playbooks with category: "security" for incident-specific workflows:
- Endpoint isolation — disable network interfaces, block USB, kill suspicious processes
- Evidence collection — gather logs, running processes, network connections, and screenshots
- Service recovery — restart affected services after containment, verify health
Security playbooks can be linked to alert trigger conditions with alertTypes: ["security_threat"] so they execute automatically (or with confirmation) when a security alert fires.
Troubleshooting
Section titled “Troubleshooting”Playbook not appearing in the list.
Verify the playbook’s isActive field is true. The GET /playbooks endpoint only returns active playbooks. Also confirm the authenticated user has access to the playbook’s organization, or that the playbook is a built-in (built-in playbooks are visible to all organizations).
Execution fails with “Missing required permissions”.
The playbook definition specifies requiredPermissions that the executing user must have. Check the missingPermissions array in the 403 response to identify which permissions are needed. Built-in playbooks require devices:read and devices:execute.
Execution stuck in pending status.
The execute endpoint creates the execution record but does not run steps automatically. The AI assistant or the calling system is responsible for driving execution by invoking each step’s tool and updating the execution via PATCH. If no external caller advances the execution, it remains in pending.
PATCH returns 409 Conflict.
The execution was modified by another process between your read and write. Re-fetch the execution via GET /playbooks/executions/:id, merge your changes with the current state, and retry the PATCH.
Invalid status transition error.
Execution status changes are validated against an allowed-transitions table. Terminal statuses (completed, failed, rolled_back, cancelled) cannot transition to any other status. Check the current execution status before attempting a PATCH.
Verify step fails but remediation worked.
The verification condition may be too strict, or the wait step before verification may not allow enough time for metrics to settle. Increase waitSeconds on the preceding wait step, or adjust the verifyCondition threshold. For disk cleanup, 30 seconds is usually sufficient; for memory relief, 300 seconds (5 minutes) is recommended.
Template variables not substituted.
Ensure the variable names in toolInput match the keys passed in the variables field of the execute request. Template placeholders use the format {{variableName}} and are case-sensitive.