Auto-Investigate Datadog Alerts
Wire PagerDuty or Datadog alerts to Devin for automatic incident investigation.Enable the Datadog MCP
Devin needs access to your Datadog account to query logs, metrics, and monitors during an investigation.
- Go to Settings > MCP Marketplace and find Datadog
- Click Enable and enter your Datadog API key and application key — generate these in Datadog > Organization Settings > API Keys
- Click Test listing tools to verify Devin can connect
Build the alert-to-Devin bridge
You need a small service that receives alert webhooks and starts a Devin session via the Devin API. Deploy this as a serverless function (AWS Lambda, Cloudflare Worker) or a lightweight container:Create a service user in Settings > Service Users at app.devin.ai with
ManageOrgSessions permission. Copy the API token shown after creation and store it as DEVIN_API_KEY on your bridge service. Set DEVIN_ORG_ID to your organization ID — get it by calling GET https://api.devin.ai/v3/enterprise/organizations with your token.The code above uses the !triage template playbook — duplicate it and customize the investigation steps for your stack, then update the playbook_id in your bridge service.Route alerts to the webhook
From Datadog directly:
- In your Datadog dashboard, go to Integrations > Webhooks
- Click New Webhook and set the URL to your bridge endpoint (e.g.,
https://your-bridge.example.com/alert) - In any monitor’s notification message, add
@webhook-devin-bridge— Devin investigates whenever that monitor fires
- In PagerDuty, go to Services > [your service] > Integrations
- Add a Generic Webhooks (v3) integration
- Set the webhook URL to your bridge endpoint and filter by event type
incident.triggered
What Devin investigates
When an alert triggers a session, Devin uses the Datadog MCP to run a structured investigation — querying logs, correlating with deploys, and tracing the error to source code.Example investigation Devin posts to Slack:
Extend the pipeline
Once basic investigation works, layer on more automation:Customize the triage playbook. The bridge code already uses the
!triage template playbook. Duplicate it and tailor the investigation checklist to your team’s stack — add service-specific runbooks, escalation paths, and conventions for hotfix PRs.Scope by severity. Route P1 alerts for immediate investigation and hotfix. Route P3 alerts for root-cause analysis only. Use different prompts or playbooks per severity level.Add Knowledge about your services — normal thresholds, architecture, on-call runbooks — so Devin’s investigation starts from your team’s context instead of from scratch.