Sensitive Data Discovery
Sensitive Data Discovery scans your managed devices for files containing personally identifiable information (PII), payment card data (PCI), protected health information (PHI), credentials, and financial records. The system uses pattern-based detection to identify sensitive content, assigns risk scores and confidence levels to each finding, and provides remediation workflows to encrypt, quarantine, or securely delete the offending files. A centralized dashboard tracks open findings, risk distribution, and remediation progress across your fleet.
Scans can be triggered manually for specific devices or scheduled through scan policies. Each scan runs on the agent side, and results are reported back to the API where findings are stored, deduplicated across scans, and made available for reporting and remediation.
Key Concepts
Section titled “Key Concepts”Data Classification Types
Section titled “Data Classification Types”| Type | Description |
|---|---|
pii | Personally identifiable information — names, addresses, phone numbers, email addresses, government IDs (SSN, passport numbers) |
pci | Payment card industry data — credit/debit card numbers, CVVs, cardholder data |
phi | Protected health information — medical records, insurance IDs, health conditions, prescription data |
credential | Stored credentials — API keys, passwords, tokens, private keys, connection strings |
financial | Financial records — bank account numbers, routing numbers, tax documents, financial statements |
Risk Levels
Section titled “Risk Levels”| Risk | Description |
|---|---|
low | Finding has limited exposure potential — low match count, non-sensitive file location, or low confidence |
medium | Finding warrants review — moderate match count or sensitive file type |
high | Finding requires attention — multiple matches in an accessible location or high-confidence detection |
critical | Finding demands immediate action — credentials in plaintext, unprotected PCI data, or high match count with high confidence |
Finding Statuses
Section titled “Finding Statuses”| Status | Description |
|---|---|
open | Active finding that has not been addressed |
remediated | Finding has been remediated (encrypted, quarantined, deleted, or manually marked) |
accepted | Risk has been explicitly accepted by an administrator |
false_positive | Finding has been determined to be a false positive |
Remediation Actions
Section titled “Remediation Actions”| Action | Destructive | Description |
|---|---|---|
encrypt | Yes | Encrypt the file in place using the configured encryption key |
quarantine | Yes | Move the file to a quarantine directory on the device |
secure_delete | Yes | Permanently and securely delete the file |
accept_risk | No | Acknowledge the finding and accept the risk |
false_positive | No | Mark the finding as a false positive |
mark_remediated | No | Manually mark the finding as remediated |
Scan Policies
Section titled “Scan Policies”Scan policies define the detection configuration, file scope, and schedule for sensitive data scans. Each policy is scoped to an organization and specifies which data types to detect, which paths to scan, and how often to run.
Creating a Policy
Section titled “Creating a Policy”POST /sensitive-data/policiesContent-Type: application/jsonAuthorization: Bearer <token>
{ "orgId": "uuid", "name": "Weekly PII & Credential Scan", "scope": { "includePaths": ["/Users", "/home", "C:\\Users"], "excludePaths": ["/Users/*/Library", "C:\\Windows"], "fileTypes": [".txt", ".csv", ".xlsx", ".json", ".env", ".conf"], "maxFileSizeBytes": 104857600, "workers": 4, "timeoutSeconds": 600 }, "detectionClasses": ["pii", "credential"], "schedule": { "enabled": true, "type": "interval", "intervalMinutes": 10080, "timezone": "America/Chicago" }, "isActive": true}Policy Fields
Section titled “Policy Fields”| Field | Type | Required | Description |
|---|---|---|---|
orgId | UUID | No | Organization ID. Auto-resolved for org-scoped tokens |
name | string | Yes | Policy name (max 200 chars) |
scope | object | No | Scan scope configuration (see below) |
detectionClasses | string[] | Yes | Data types to detect: pii, pci, phi, credential, financial (1-5) |
schedule | object | No | Scan schedule configuration (see below) |
isActive | boolean | No | Whether the policy is active (default true) |
Scope Configuration
Section titled “Scope Configuration”| Field | Type | Description |
|---|---|---|
includePaths | string[] | Paths to scan (max 256, each up to 2,048 chars) |
excludePaths | string[] | Paths to exclude from scanning (max 256) |
fileTypes | string[] | File extensions to scan (max 128, each up to 32 chars) |
maxFileSizeBytes | integer | Maximum file size to scan (1 KB to 1 GB) |
workers | integer | Concurrent scan workers (1-32) |
timeoutSeconds | integer | Scan timeout in seconds (5-1,800) |
suppressPaths | string[] | Paths to suppress in findings (max 256) |
suppressPatternIds | string[] | Pattern IDs to suppress (max 200) |
suppressFilePathRegex | string[] | Regex patterns for file paths to suppress (max 80) |
ruleToggles | object | Per-rule enable/disable overrides (key: rule ID, value: boolean) |
Schedule Configuration
Section titled “Schedule Configuration”| Field | Type | Description |
|---|---|---|
enabled | boolean | Whether the schedule is active (default true) |
type | string | Schedule type: manual, interval, or cron |
intervalMinutes | integer | Scan interval in minutes (5 to 10,080 / one week) |
cron | string | Cron expression (when type is cron) |
timezone | string | Timezone for scheduled scans |
deviceIds | UUID[] | Specific devices to scan (max 1,000). If omitted, scans all devices in the org |
Updating a Policy
Section titled “Updating a Policy”PUT /sensitive-data/policies/:idContent-Type: application/json
{ "name": "Updated Scan Policy", "detectionClasses": ["pii", "credential", "pci"], "isActive": true}Deleting a Policy
Section titled “Deleting a Policy”DELETE /sensitive-data/policies/:idRunning Scans
Section titled “Running Scans”Triggering a Manual Scan
Section titled “Triggering a Manual Scan”POST /sensitive-data/scanContent-Type: application/jsonAuthorization: Bearer <token>
{ "deviceIds": ["device-uuid-1", "device-uuid-2"], "policyId": "policy-uuid", "detectionClasses": ["pii", "credential"], "scope": { "includePaths": ["/home"], "fileTypes": [".txt", ".csv", ".env"], "maxFileSizeBytes": 52428800 }, "idempotencyKey": "scan-2026-02-15-batch-1"}| Field | Type | Required | Description |
|---|---|---|---|
deviceIds | UUID[] | Yes | Devices to scan (1-200) |
policyId | UUID | No | Use an existing policy’s scope and detection classes |
scope | object | No | Override the scan scope (takes precedence over policy scope) |
detectionClasses | string[] | No | Override detection classes (takes precedence over policy) |
idempotencyKey | string | No | Client-provided idempotency key (8-128 chars). Also accepted via Idempotency-Key header |
The API creates one scan record per device and enqueues each scan through BullMQ. Decommissioned devices are excluded. The response includes the created scans, the count of successfully queued jobs, and any skipped device IDs.
Scan Statuses
Section titled “Scan Statuses”| Status | Description |
|---|---|
queued | Scan created and enqueued for the agent |
running | Agent is actively scanning the device |
completed | Scan finished — results include findings summary |
failed | Scan encountered an error |
Listing Recent Scans
Section titled “Listing Recent Scans”GET /sensitive-data/scans?limit=50Returns recent scans ordered by creation time, with device hostname, policy reference, status, and timing information.
Getting Scan Details
Section titled “Getting Scan Details”GET /sensitive-data/scans/:idReturns full scan details including a findings summary with counts by risk level and status. If the scan has pre-computed summary counters in its summary JSONB, those are returned directly. Otherwise, findings are aggregated on the fly.
Findings and Reports
Section titled “Findings and Reports”Querying Findings
Section titled “Querying Findings”GET /sensitive-data/report?status=open&risk=critical&dataType=credential&page=1&limit=50Report Query Parameters
Section titled “Report Query Parameters”| Parameter | Type | Description |
|---|---|---|
status | string | Filter by status: open, remediated, accepted, false_positive |
risk | string | Filter by risk level: low, medium, high, critical |
dataType | string | Filter by data type: pii, pci, phi, credential, financial |
deviceId | UUID | Filter findings for a specific device |
scanId | UUID | Filter findings from a specific scan |
page | integer | Page number for pagination |
limit | integer | Results per page (default 200) |
Finding Response Fields
Section titled “Finding Response Fields”| Field | Type | Description |
|---|---|---|
id | UUID | Unique finding identifier |
orgId | UUID | Organization ID |
deviceId | UUID | Device where the file was found |
deviceName | string | Hostname of the device |
scanId | UUID | Scan that discovered the finding |
filePath | string | Full path to the file containing sensitive data |
dataType | string | Classification type: pii, pci, phi, credential, financial |
patternId | string | Identifier of the detection pattern that matched |
matchCount | integer | Number of matches found in the file |
risk | string | Risk level: low, medium, high, critical |
confidence | float | Detection confidence score (0.0 to 1.0) |
fileOwner | string | File owner on the device |
fileModifiedAt | ISO 8601 | When the file was last modified |
firstSeenAt | ISO 8601 | When this finding was first detected |
lastSeenAt | ISO 8601 | When this finding was last confirmed |
occurrenceCount | integer | Number of scans that have found this file |
status | string | Current status: open, remediated, accepted, false_positive |
remediationAction | string | Action taken (if any) |
remediatedAt | ISO 8601 | When remediation occurred |
Dashboard
Section titled “Dashboard”The dashboard endpoint aggregates all findings data into a single response for the sensitive data overview:
GET /sensitive-data/dashboardDashboard Response
Section titled “Dashboard Response”{ "data": { "totals": { "findings": 1250, "open": 842, "criticalOpen": 23, "remediated24h": 45, "averageOpenAgeHours": 168.5 }, "byDataType": { "pii": 520, "credential": 380, "pci": 200, "phi": 100, "financial": 50 }, "byRisk": { "low": 600, "medium": 350, "high": 200, "critical": 100 } }}| Field | Description |
|---|---|
totals.findings | Total number of findings across all statuses |
totals.open | Number of findings in open status |
totals.criticalOpen | Number of critical-risk findings that are still open |
totals.remediated24h | Number of findings remediated in the last 24 hours |
totals.averageOpenAgeHours | Average age of open findings in hours |
byDataType | Finding count broken down by data classification |
byRisk | Finding count broken down by risk level |
Remediation
Section titled “Remediation”Remediating Findings
Section titled “Remediating Findings”POST /sensitive-data/remediateContent-Type: application/jsonAuthorization: Bearer <token>
{ "findingIds": ["finding-uuid-1", "finding-uuid-2"], "action": "quarantine", "confirm": true, "quarantineDir": "/var/lib/breeze/quarantine/sensitive", "dryRun": false}Remediation Request Fields
Section titled “Remediation Request Fields”| Field | Type | Required | Description |
|---|---|---|---|
findingIds | UUID[] | Yes | Findings to remediate (1-250) |
action | string | Yes | encrypt, quarantine, secure_delete, accept_risk, false_positive, mark_remediated |
confirm | boolean | Conditional | Required for destructive actions (encrypt, quarantine, secure_delete) |
dryRun | boolean | No | Preview which findings would be affected without making changes (default false) |
secondApprovalToken | string | Conditional | Required for secure_delete when second approval is enabled |
encryptionKeyRef | string | No | Reference to the encryption key (for encrypt action) |
encryptionKeyVersion | string | No | Version of the encryption key |
quarantineDir | string | No | Custom quarantine directory path on the device |
How Remediation Works
Section titled “How Remediation Works”-
Non-destructive actions (
accept_risk,false_positive,mark_remediated) update the finding status directly in the database. No command is sent to the agent. -
Destructive actions (
encrypt,quarantine,secure_delete) queue a command to the target device’s agent via the command queue. Each finding becomes a separate command targeting the specific file path. -
Dry run mode returns the list of eligible findings and their file paths without making any changes, allowing you to preview the impact before committing.
-
Second approval can be required for
secure_deleteoperations by setting theSENSITIVE_DATA_REQUIRE_SECOND_APPROVALenvironment variable. When enabled, a validsecondApprovalTokenmust be provided.
Remediation Response
Section titled “Remediation Response”For destructive actions, the response includes queued commands and any failures:
{ "data": { "queued": [ { "findingId": "uuid", "commandId": "uuid" } ], "failed": [ { "findingId": "uuid", "error": "Device is offline" } ], "updated": 5 }}API Reference
Section titled “API Reference”| Method | Path | Description |
|---|---|---|
| POST | /sensitive-data/scan | Trigger a manual scan on one or more devices |
| GET | /sensitive-data/scans | List recent scans with status and summary |
| GET | /sensitive-data/scans/:id | Get scan details with findings breakdown |
Findings and Reports
Section titled “Findings and Reports”| Method | Path | Description |
|---|---|---|
| GET | /sensitive-data/report | Query findings with filtering and pagination |
| GET | /sensitive-data/dashboard | Aggregated dashboard with totals, risk, and data type distribution |
Remediation
Section titled “Remediation”| Method | Path | Description |
|---|---|---|
| POST | /sensitive-data/remediate | Remediate findings (destructive or non-destructive) |
Policies
Section titled “Policies”| Method | Path | Description |
|---|---|---|
| GET | /sensitive-data/policies | List all scan policies for the organization |
| POST | /sensitive-data/policies | Create a new scan policy |
| PUT | /sensitive-data/policies/:id | Update an existing policy |
| DELETE | /sensitive-data/policies/:id | Delete a policy |
Troubleshooting
Section titled “Troubleshooting”Scan stuck in queued status.
The scan was created but the agent has not started processing it. Verify that the target device is online and the agent is connected. Check that BullMQ workers are running and processing the sensitive data scan queue. If the scan enqueue failed, the creation response includes an enqueueFailures count greater than zero.
No findings returned after a completed scan.
The scan completed but did not detect any sensitive data matching the configured detection classes and scope. Verify the detectionClasses include the types you expect to find. Check the scope.includePaths to ensure the correct directories are being scanned. Review scope.excludePaths and suppressPaths to make sure the target files are not being excluded. Also check scope.maxFileSizeBytes — files larger than the limit are skipped.
Duplicate scan created despite idempotency key. Idempotency checks match on both the key and the request fingerprint (a SHA-256 hash of device IDs, policy, scope, and detection classes). If any of these values differ between requests, the fingerprint will not match and a new scan will be created. Idempotency protection also only applies to scans created within the last 24 hours.
Destructive remediation rejected with confirm=true error.
Destructive actions (encrypt, quarantine, secure_delete) require confirm: true in the request body. If the secure_delete action is rejected despite confirmation, check whether the SENSITIVE_DATA_REQUIRE_SECOND_APPROVAL environment variable is enabled — if so, a valid secondApprovalToken must also be provided.
Remediation command failed to queue for a device.
When a destructive remediation command fails to queue, the finding ID appears in the failed array of the response with an error message. Common causes include the device being offline or the command queue being unavailable. The finding’s remediationAction and remediationMetadata are still updated to reflect the attempted action, but the agent will not receive the command until it is re-queued.
Dashboard shows stale totals.
The dashboard computes totals by scanning all findings in real time. If the finding count is large, the response may take a moment to compute. The averageOpenAgeHours is calculated from each open finding’s lastSeenAt timestamp. If scans are not running regularly, the age values may appear inflated.