DORA (DevOps Research and Assessment) is a research program that has identified five key metrics for measuring software delivery performance since its research began in 2014. In 2024 the framework evolved to its current five-metric form, grouped as throughput (Change Lead Time, Deployment Frequency, Failed Deployment Recovery Time) and instability (Change Fail Rate, Deployment Rework Rate). The 2025 DORA report — focused on AI-assisted software development — established the central insight that AI is an amplifier: it accelerates strong teams and magnifies dysfunction in struggling ones, making solid engineering foundations more important than ever. One enduring DORA finding remains: speed and stability are not tradeoffs — elite teams excel at both, consistently proving that practices enabling frequent deployment also improve reliability.
23 tables, 145 concepts. Select a concept node to jump to its table row.
Table 1: Core DORA Metrics
The five DORA metrics form an interconnected system covering two factors: throughput (how fast changes flow to production) and instability (how often those changes cause problems). Tracking all five together prevents the gaming that arises from optimizing a single number.
| Metric | Example | Description |
|---|---|---|
Multiple deploys/day (elite) | • How often code is deployed to production • throughput metric measuring team velocity and continuous delivery maturity. | |
Commit at 9am → Production at 11am = 2 hours | • Time from code commit to running in production • measures end-to-end delivery speed including coding, review, testing, and deployment. | |
Incident detected at 2pm, service restored at 3:30pm = 1.5 hours | • Time to restore service after a deployment-caused failure • renamed from MTTR in 2023 to focus strictly on change-induced outages, not external infrastructure failures; throughput metric. | |
5 deployments, 1 requires hotfix = 20% | • Percentage of production deployments requiring immediate remediation (rollback, hotfix, or incident) • instability metric reflecting deployment quality. | |
3 out of 20 deploys this month were unplanned incident fixes = 15% | • Ratio of deployments that are unplanned work caused by production incidents rather than new feature releases • added as the official 5th metric in 2024 to capture reactive capacity drain; instability metric. |
Table 2: Performance Benchmarks by Tier
DORA's four performance tiers — derived from statistical cluster analysis of thousands of teams — remain the standard for benchmarking each metric. The 2025 DORA report also introduced seven team archetypes (see Table 15) that capture the interplay of performance, stability, and well-being beyond simple tier labels.
| Tier | Example | Description |
|---|---|---|
Multiple deploys/day<1 day lead time0-15% failure rate<1 hour FDRT | • Top performers with on-demand deployments, sub-day delivery, minimal failures, and rapid recovery • top ~19% of surveyed teams. | |
Daily to weekly deploys<1 week lead time16-30% failure rate<1 day FDRT | Strong performers with regular deployment cadence, weekly delivery cycles, and same-day incident resolution. | |
Weekly to monthly deploys1 week to 1 month lead time16-30% failure rate<1 week FDRT | Average performers with periodic releases, longer feedback loops, and multi-day recovery times. | |
Less than monthly deploys>1 month lead time46-60% failure rate>1 week FDRT | Bottom quartile with infrequent releases, extended delivery cycles, high failure rates, and prolonged outages. |
Table 3: Deployment Frequency Measurement
Deployment frequency is straightforward in concept but subtle in practice — the key is counting only what genuinely reaches end users, and choosing the right aggregation method to avoid distortion from outliers or quiet periods.
| Approach | Example | Description |
|---|---|---|
100 deployments ÷ 30 days = 3.33/day | • Count production deployments over time period • simplest calculation dividing total deploys by days measured. | |
Team deploys whenever ready, averaging 5x/day | • Elite teams deploy multiple times daily without fixed schedules • indicates mature CI/CD and automated testing. | |
Exclude staging/dev deploys, count prod only | • Only deployments reaching end users count • staging and test environment deploys are excluded. | |
Median: 1/day (ignores outliers)Mean: 2.5/day (includes spikes) | Median preferred over mean to avoid skewing from irregular batch releases or quiet periods. | |
Deploy Mon, deploy Thu = 3 days between | Alternative approach measuring average days between successive deployments rather than deploys per period. |
Table 4: Lead Time Stages and Breakdown
Lead time for changes is rarely limited by coding speed — most of the clock ticks during waiting: idle queues before review, slow CI pipelines, and delayed deployment windows. Decomposing the total into stages reveals exactly where the bottleneck lives.
| Stage | Example | Description |
|---|---|---|
First commit → Last commit on feature = 4 hours | • Active development time from initial commit to completion • excludes waiting periods. | |
PR opened → PR approved = 8 hours | • Code review duration including wait time for reviewers and addressing feedback • bottleneck in many teams; often expands in AI-assisted environments as review volume increases. | |
CI pipeline runs automated tests = 15 minutes | • Automated and manual testing duration • includes CI/CD pipeline execution time. | |
Merge to main → Deployed to prod = 30 minutes | Production deployment execution including artifact building, environment provisioning, and release automation. | |
Completed code sits 12 hours before review starts | • Idle time between stages • often the largest component of lead time and prime improvement target. | |
First commit → Running in production = 2 days | • End-to-end time from code commit to production • sum of all stages including waiting periods. |
Table 5: Change Failure Rate Calculation
Change fail rate is the most debated DORA metric because "failure" requires a team definition. Consistent, agreed-upon definitions matter far more than a precise number — a well-defined 20% CFR is more actionable than a vague 5%.
| Method | Example | Description |
|---|---|---|
(1 failure ÷ 5 deploys) × 100 = 20% | • Failed deployments divided by total deployments • percentage of changes requiring immediate remediation. | |
Deployment causes degraded service or hotfix needed | • Failure requires rollback, hotfix, or incident response • minor bugs found later may not count. | |
Version 1.5 deployed, then 1.4 redeployed = rollback | Automatically detect when previous version is redeployed shortly after new version. | |
SEV1 incident within 24h of deploy links to that deploy | Link production incidents to recent deployments within detection window (typically 24-48 hours). | |
Emergency patch deployed same day = failed change | Count unplanned remediation deployments following a regular release as failures. | |
Issues appearing within 48h of deploy count as failures | • Define detection window for linking failures to deployments • too short misses issues, too long inflates rate. |
Table 6: Recovery Time Measurement
Failed deployment recovery time (formerly MTTR) focuses strictly on recovery from deployment-caused failures, not all production incidents. This distinction matters: infrastructure outages should not inflate the metric that measures delivery quality.
| Strategy | Example | Description |
|---|---|---|
Alert fired 2pm, resolved 3:30pm = 90 minutes | • Measure from incident detection (first alert/ticket) to resolution (service restored) • most common approach. | |
Users affected 2:05pm, service normal 3:25pm = 80 minutes | • Time between user-facing impact beginning and ending • more accurate than internal detection timestamps. | |
Issue detected → Code fix deployed = 45 minutes | • Narrow definition measuring fix deployment time only • excludes detection lag and validation. | |
10 incidents: 20m, 30m, 1h, 2h... → median = 50m | • Use median recovery time rather than mean • prevents single long outages from distorting metric. | |
SEV1 incidents tracked separately from SEV3 | • Track recovery time by incident severity • critical outages and minor issues have different expectations. | |
Only count failures caused by your own code changes | • Exclude infrastructure outages and third-party failures from the metric • the 2023 DORA redefinition explicitly scoped this to deployment-caused incidents. |
Table 7: Metric Collection Automation
Manual data collection for DORA rarely scales beyond a team of five — automation is essential. The key is instrumenting pipelines at the right points so events flow to your metrics system without requiring developer overhead.
| Tool | Example | Description |
|---|---|---|
GitHub Actions logs deployment timestamp to database | • Instrument pipeline stages to emit structured events • timestamp commits, builds, tests, and deployments. | |
GitLab sends commit events to metrics collector API | Use repository webhooks to capture commit timestamps and PR merge events automatically. | |
PagerDuty API queries incidents linked to deployments | Pull incident data from ticketing systems (Jira, PagerDuty, Opsgenie) to calculate CFR and FDRT. | |
OTEL spans track code from commit through production | • Use distributed tracing to measure lead time across stages • captures timing automatically. | |
DevLake connects GitHub + Jira + Jenkins → DORA dashboard | • Open-source DORA platform aggregating data from Git, CI/CD, and incident tools • strongest self-hosted option; ships with a built-in Grafana dashboard. | |
Datadog DORA metrics dashboard queries deployment logs | • Leverage observability platforms with built-in DORA tracking • Datadog, New Relic, Splunk offer native support. | |
Python script queries Git + Jenkins + ServiceNow hourly | • Build custom integration pulling data from multiple sources • useful when tools lack native DORA support. |
Table 8: Dashboard Design Best Practices
A good DORA dashboard surfaces actionable signals, not just metrics — the design choices around context, comparison, and drill-down determine whether a dashboard drives improvement or just decorates a wall.
| Practice | Example | Description |
|---|---|---|
Single view showing DF, LT, CFR, FDRT, DRR side-by-side | • Display all metrics simultaneously to show throughput/instability balance • avoids tunnel vision on one dimension. | |
30-day rolling average with week-over-week comparison | • Show historical trends not just current values • reveals improvement/regression patterns over time. | |
Color-coded badges: Elite (green), High (blue), Medium (yellow), Low (red) | • Visually indicate benchmark tier using DORA research categories • makes performance level immediately clear. | |
Click CFR → View list of failed deployments with details | • Enable detailed exploration from summary metrics • users can investigate specific incidents or slow deployments. | |
Toggle between team-level and company-wide rollups | • Support multiple aggregation levels • individual teams and organizational leaders have different needs. | |
Progress bar: Current 2/day → Target 5/day deployment frequency | • Display improvement targets alongside current values • keeps teams aligned on objectives. |
Table 9: Team-Level vs Organization-Level Metrics
Aggregation strategy is one of the most important — and most overlooked — DORA decisions. Rolling up metrics too broadly hides the teams that need help and the teams worth learning from.
| Scope | Example | Description |
|---|---|---|
Mobile team: 8/day DF | Backend team: 3/day DF | • Measure individual team performance • avoids averaging out high/low performers and enables targeted improvement. | |
User-service: <1h LT | Payment-service: 4h LT | • Track metrics per service/microservice • reveals which components have efficient vs slow delivery. | |
Company-wide: 500 deploys/week across 20 teams | • Roll up to company-wide totals • useful for executive reporting and cross-team benchmarking. | |
Don't average 1 elite + 1 low team = medium performance | • Distribution matters more than mean • showing performance spread across teams is more valuable than single number. | |
Compare similar teams, not frontend vs infrastructure | • Compare teams with similar contexts • web app teams vs infrastructure teams face different constraints. |
Table 10: Common Implementation Pitfalls
Most DORA implementations fail not from technical complexity but from definitional inconsistency, missing baselines, or misuse as competitive rankings rather than improvement guides.
| Pitfall | Example | Description |
|---|---|---|
Start tracking CFR with no historical reference point | • Establishing initial baseline is critical • without it, impossible to determine if improvements are working. | |
Team A counts staging deploys, Team B only counts production | • Standardize metric definitions across teams • inconsistent counting makes comparisons meaningless. | |
Split one deploy into five to inflate deployment frequency | • Goodhart's Law: when measure becomes target, ceases to be good measure • focus on outcomes not numbers. | |
Blame team for low DF when they're maintaining legacy system | • Consider organizational constraints • regulated industries, legacy tech, and compliance have legitimate friction. | |
Only report deployment frequency, hide change failure rate | • Track all five metrics as a set • optimizing one while ignoring others creates false performance picture. | |
Assume Azure DevOps tracks DORA automatically without configuration | • Most tools require custom configuration • default dashboards rarely track DORA without pipeline instrumentation. | |
Dev team owns DF, ops team owns FDRT with no shared discussion | • Metrics must be shared across dev, ops, and release teams • isolated ownership creates finger-pointing rather than collaboration. |
Table 11: Avoiding Metrics Gaming
Gaming emerges when metrics become targets rather than signals. The antidote is a combination of multiple counter-balanced metrics, qualitative checks, and a culture where improvement matters more than hitting a number.
| Strategy | Example | Description |
|---|---|---|
Deploy to deliver value, not to hit a deployment count | • Emphasize customer impact and business outcomes as primary goals • metrics guide but don't define success. | |
Elite DF but high CFR reveals quality problems | • View metrics holistically • speed without stability or stability without speed indicates imbalance. | |
Survey: "Do deployments feel risky?" despite low CFR | • Supplement quantitative DORA metrics with qualitative feedback • numbers don't capture fear, toil, or satisfaction. | |
Never use DORA for performance reviews or ranking devs | • DORA measures team/system performance not individuals • using for personal evaluation destroys psychological safety. | |
Highlight improvement, not blame for low performers | • Use metrics for learning and improvement • punishment creates metric manipulation and fear. | |
Low DF reveals slow review process, not lazy developers | • Investigate systemic bottlenecks when metrics lag • usually indicates process/tool issues not people problems. |
Table 12: Metrics-Driven Improvement Strategies
The highest-leverage improvements typically address multiple DORA metrics simultaneously — automated testing, small batch sizes, and trunk-based development each improve throughput and stability at once, compounding returns.
| Strategy | Example | Description |
|---|---|---|
Deploy single feature vs months of accumulated changes | • Smaller changes are easier to test, review, and roll back • improves all five DORA metrics; especially critical in AI-assisted environments where large AI-generated PRs slow reviews. | |
Add comprehensive unit/integration test suite | • Automated test coverage enables confident frequent deployments • reduces CFR and improves DF simultaneously. | |
All devs commit to main, use feature flags for incomplete work | • Short-lived branches and frequent integration reduce lead time • decreases merge conflicts and accelerates feedback. | |
Canary deploy to 5% → 25% → 100% with automated rollback | • Gradual rollouts with monitoring catch issues before full impact • reduces FDRT and improves CFR. | |
One-click production deploys via CI/CD pipeline | • Remove manual deployment steps • reduces lead time, increases DF, and minimizes human error causing failures. | |
Add structured logging, metrics, tracing to all services | • Real-time monitoring enables fast failure detection and diagnosis • directly improves FDRT. | |
After incident, focus on process improvements not blame | • Learning from failures without punishment • improves both CFR and FDRT through systemic fixes. |
Table 13: Correlation with Business Outcomes
DORA research has repeatedly shown that high software delivery performance predicts better organizational outcomes — not just faster features, but higher revenue growth, stronger market position, and better employee retention.
| Outcome | Example | Description |
|---|---|---|
Low LT enables quick response to competitor features | Rapid delivery lets organizations capitalize on market opportunities before competitors. | |
Frequent small releases reduce disruptive big-bang updates | • Continuous improvement delivers value faster and with less disruption • increases user retention. | |
Low FDRT minimizes revenue loss during outages | • Fast recovery limits business impact of incidents • every hour of downtime has direct financial cost. | |
Elite teams spend less time firefighting, more on new features | • Lower failure rates free engineering capacity • teams can focus on innovation vs operational toil. | |
2x faster deployment enables A/B testing and experimentation | • High throughput with stability allows rapid learning • organizations can test hypotheses and adapt quickly. | |
Elite-performing teams have higher job satisfaction | • Better metrics correlate with better work experience • less toil and firefighting improves morale. |
Table 14: Throughput vs Instability Balance
The DORA framework deliberately splits metrics into throughput and instability rather than speed and quality — the framing matters because instability is an outcome of delivery practices, not a separate quality axis. Teams that improve throughput without controlling instability are not actually improving.
| Concept | Example | Description |
|---|---|---|
Elite teams: High DF + Low CFR simultaneously | • DORA research consistently shows speed and stability improve together • practices enabling one benefit both. | |
Deployment Frequency + Change Lead Time + Failed Deployment Recovery Time | • Measure how fast and how much teams deliver • velocity, flow efficiency, and recovery speed through the delivery pipeline. | |
Change Fail Rate + Deployment Rework Rate | • Measure how reliably teams deliver • production quality and proportion of reactive vs planned work. | |
Teams improving DF typically also improve CFR | • Metrics are positively correlated not inversely • automation and testing improve all dimensions. | |
High DF with 50% CFR = thrashing, not true velocity | • Deployment frequency alone misleads • frequent failed releases waste more time than slower quality releases. | |
Low CFR with monthly deploys = avoiding risk, not quality | Low failure rate with infrequent deploys may indicate fear of change not robust processes. | |
High DF + High DRR = shipping fast but creating debt | • Deployment rework rate rising alongside frequency signals reactive capacity drain • the team is moving fast but generating more unplanned work than value. |
Table 15: Research Methodology and Evidence
DORA's credibility rests on its empirical foundation — statistical analysis of survey data from tens of thousands of practitioners, peer-reviewed in the Accelerate book, and continuously refined across more than a decade of annual reports.
| Aspect | Example | Description |
|---|---|---|
Annual State of DevOps Report surveys 30,000+ practitioners | • DORA research uses large-scale surveys across industries • statistically rigorous analysis identifies patterns. | |
Statistical clustering identifies performance groups, not fixed thresholds | • Performance tiers emerge from data-driven clustering not arbitrary cutoffs • benchmarks evolve as industry improves. | |
Nicole Forsgren, Jez Humble, Gene Kim (2018) | • Foundational research publication establishing DORA framework • demonstrates statistical relationships between practices and outcomes. | |
2025 report: Foundational Challenges → Harmonious High-Achievers | • 2025 DORA replaced simple 4-tier ranking with 7 team archetypes combining delivery, stability, and well-being • profiles include Foundational Challenges, Legacy Bottleneck, Process-Constrained, Pragmatic Performers, and Harmonious High-Achievers. | |
Technical practices + culture predict high DORA performance | • Research identifies capabilities driving outcomes • not a prescriptive blueprint but evidence-based practices. | |
2023: MTTR renamed FDRT | 2024: Deployment Rework Rate added | 2025: AI focus | • Framework adapts over time based on new research • the 2024 addition of Deployment Rework Rate was the most significant structural change since 2014. | |
Patterns hold across finance, tech, healthcare, government | • Research findings generalize broadly • same metrics predict success regardless of industry or organization size. | |
Five questions at dora.dev/quickcheck — results in under one minute | • Free benchmark tool comparing team to industry performance tier • no data stored; instant result; useful for starting the DORA conversation with leadership. |
Table 16: Trend Analysis Over Time
Single metric snapshots are almost meaningless — DORA metrics only become actionable when tracked as trends over weeks and quarters, with clear before/after comparisons tied to specific changes in process or tooling.
| Pattern | Example | Description |
|---|---|---|
30-day moving average smooths weekly spikes | • Use time windows to reduce noise • daily volatility obscures meaningful trends. | |
This week DF: 12 deploys | Last week: 9 = +33% improvement | • Short-period comparisons reveal recent changes • useful for evaluating specific interventions. | |
Q1 median LT: 3 days | Q4: 1 day = 67% improvement | • Longer period trends show sustained progress • smooths seasonal effects like holidays. | |
December shows lower DF due to code freeze | • Account for predictable variations • year-end freezes, major releases, and on-call rotations affect metrics. | |
Pre-automation: 5 day LT | Post-automation: 1 day LT | • Measure impact of changes • compare periods before and after process improvements or tool adoption. | |
Alert when 7-day CFR exceeds 20% threshold | • Automated alerts for metric degradation • catch performance declines before they become systemic. |
Table 17: Qualitative vs Quantitative Indicators
DORA numbers show what is happening in a delivery system; qualitative data explains why and whether the people inside that system are experiencing it sustainably. Both are needed for a complete picture.
| Type | Example | Description |
|---|---|---|
Deployment Frequency: 3.2 per day | • Numerical measurements of delivery performance • objective, comparable, and trackable over time. | |
"Deployments feel risky" survey response | • Subjective experience of development process • captures fear, toil, and satisfaction that numbers miss. | |
"Build pipeline is slow and frustrating" | • Team member perceptions and feelings • low satisfaction despite good metrics suggests hidden problems. | |
Team discusses failures openly without blame | • Cultural health indicator • Westrum generative culture correlates with high DORA performance. | |
Combine CFR (quant) with incident retro quality (qual) | • Both types together provide complete picture • numbers show what happened, feedback explains why. | |
Satisfaction, Performance, Activity, Communication, Efficiency | • Broader context for DORA • Microsoft SPACE model supplements delivery metrics with wellbeing and collaboration. | |
Speed + Effectiveness + Quality + Business Impact | • Unified framework encapsulating DORA, SPACE, and DevEx into four counterbalanced dimensions • optimizing one dimension cannot improve the full score without also improving the others. |
Table 18: Tools and Platforms
The DORA metrics tooling market has matured considerably, ranging from open-source self-hosted options like Apache DevLake to enterprise intelligence platforms; the right choice depends on your toolchain, team size, and whether you need DORA alone or a broader engineering intelligence stack.
| Tool | Example | Description |
|---|---|---|
Native dashboards with automatic deployment detection | • Observability platform with built-in DORA tracking • integrates CI/CD, APM, and incident data. | |
Native metrics for GitLab CI/CD pipelines | • Platform-native tracking for GitLab teams • automatic calculation from pipeline and incident data. | |
Open-source self-hosted: connects GitHub + Jira + Jenkins → Grafana dashboard | • Strongest open-source option; ingests 40+ DevOps tools • full control, no vendor lock-in; requires self-hosting and setup investment. | |
Engineering intelligence platform with DORA benchmarks | • Specialized DORA tool connecting Git, Jira, and CI/CD • automatic metric calculation and team comparisons. | |
Data integration platform normalizing metrics across tools | • Engineering intelligence aggregating data from multiple sources • supports custom metric definitions. | |
Deployment tracking with automatic failure detection | • Lightweight DORA tracker focused on deployment frequency and CFR • simple GitHub integration. | |
Open-source action calculating lead time from workflow logs | • Custom GitHub Action for teams building own DORA tracking • transparent calculation logic. | |
Open-source reference implementation on GCP | • Google's DORA reference tool demonstrating metric collection patterns • educational starting point for custom builds. |
Table 19: Platform Engineering Impact
Platform engineering directly improves DORA metrics by eliminating the manual handoffs, waiting queues, and cognitive overhead that inflate lead time and reduce deployment frequency.
| Practice | Example | Description |
|---|---|---|
Self-service deployment portal with golden paths | • Platform teams enable developers to ship faster • standardized workflows reduce cognitive load and lead time. | |
Template repos with CI/CD, monitoring pre-configured | • Paved roads making the correct approach easiest • new services inherit high-quality patterns improving DORA metrics. | |
Developers provision databases via UI without tickets | • Eliminate waiting for IT operations • reduces lead time by removing manual handoffs. | |
Track platform adoption rate alongside DORA metrics | Platform team measures developer satisfaction and adoption plus downstream impact on delivery performance. | |
Survey: "How easy to deploy?" + measure actual DF/LT | • Combine perception and reality metrics • great platform should improve both satisfaction and delivery performance. | |
High-quality internal platforms amplify AI benefit at organizational scale | • 2025 DORA research: quality internal platforms are prerequisite for AI ROI • AI adoption without a robust platform produces disconnected local gains, not organizational improvement. |
Table 20: Organizational Culture and Westrum Model
Culture is not a soft factor in DORA research — it is one of the strongest predictors of delivery performance. Westrum's three-category model provides a practical language for diagnosing organizational information flow and its impact on DORA outcomes.
| Culture | Example | Description |
|---|---|---|
Information hoarded, messengers shot, failure hidden | • Power-oriented culture where information is political currency • consistently correlates with low DORA performance. | |
Rule-oriented, narrow focus, failure leads to blame | • Process-focused culture where rules matter more than mission • typical of medium DORA performance organizations. | |
Mission-oriented, cooperation, failure leads to inquiry | • Performance-oriented culture with high trust and information flow • strong predictor of elite DORA metrics. | |
Team members speak up without fear of punishment | • Foundation for high performance • enables honest incident review and learning from failures, improving FDRT and CFR. | |
Postmortems focus on system improvements not individual blame | • Learning-focused incident response • openness about failures improves both FDRT (faster recovery) and CFR (fewer repeat failures). | |
Generative orgs share safety signals broadly and actively | Westrum's research shows how organizations process information predicts safety and performance outcomes. |
Table 21: Advanced Topics and Extensions
As DORA matures and AI reshapes software development, the framework is extended with metrics that capture AI-era concerns — code durability, AI vs. human work attribution, and developer well-being — that the original four metrics cannot surface.
| Topic | Example | Description |
|---|---|---|
3 unplanned incident-fix deploys ÷ 20 total deploys = 15% rework rate | • Official 5th metric since 2024: count deployments triggered by production incidents as a share of all deployments • no official benchmarks yet; track directionally and set internal thresholds. | |
Visualize flow from idea → production including all wait states | • Map entire delivery process to identify bottlenecks • DORA metrics measure outcomes; VSM reveals causes. | |
Combine DORA with Satisfaction, Activity, Communication | • Broader performance model adding wellbeing and collaboration • DORA alone doesn't capture developer experience or communication quality. | |
Speed + Effectiveness + Quality + Business Impact | • Unifies DORA, SPACE, and DevEx into four counterbalanced dimensions • adopted at 300+ companies including Meta, Microsoft, and Uber; prevents optimizing one dimension by hiding another. | |
90% of developers use AI; throughput up, stability down | • 2025 DORA finding: AI adoption correlates positively with throughput but negatively with delivery stability • AI accelerates code generation → larger batches → slower review → more instability. | |
6% of committed code rewritten within 14 days | • AI-era quality signal: measures code durability that CFR cannot see • AI code passing tests at deploy time may still be silently replaced within weeks; pre-AI baseline ~3.3%, rising with heavy AI use. | |
99.9% uptime SLO with error budget | • SRE practices complement DORA • error budgets and reliability budgets provide operational context alongside delivery metrics. | |
Healthcare orgs face compliance overhead affecting lead time | • Industry-specific factors influence achievable benchmarks • compare against organizations with similar regulatory environments, not global averages. |
Table 22: DORA AI Capabilities Model
The 2025 DORA AI Capabilities Model identifies seven practices that amplify the benefits of AI adoption in engineering organizations. These are established engineering best practices that become more critical — not less — when AI accelerates code generation and expands batch sizes.
| Capability | Example | Description |
|---|---|---|
All AI coding effort tied to a specific user problem or job-to-be-done | • Most critical capability: without user focus, AI adoption can have a net-negative impact on team performance • AI makes building the wrong thing faster than ever before. | |
Self-service deployment, automated testing, standardized pipelines | • Platform is the distribution layer for scaling AI benefits from individual to organizational • without a quality platform, AI gains remain isolated and disconnected from delivery outcomes. | |
Published policy: permitted tools, use cases, and data privacy rules | • Ambiguity about acceptable AI use stifles adoption and creates risk • organizations with clear AI policies show 451% higher team AI adoption than those without. | |
Break AI-generated work into reviewable increments, not giant PRs | • Amplifies AI's positive influence on product performance • AI can generate large amounts of code rapidly; small batches keep review and testing manageable. | |
Internal data is high-quality, accessible, and unified | • AI is only as good as its data • high-quality unified internal data substantially amplifies AI's positive influence on organizational performance. | |
AI tools connected to internal codebase, docs, and architecture diagrams | • Connecting AI to company-specific context transforms it from generic assistant to specialized tool • amplifies individual effectiveness and code quality. | |
Frequent commits, disciplined rollback use, short-lived feature branches | • AI increases code velocity, making the safety net of version control more critical • frequent commits and rollback proficiency amplify AI's positive impact on team effectiveness. |
Table 23: AI-Era DORA Extensions
When AI tools generate 30–70% of committed code, standard DORA metrics become ambiguous — deployment frequency can inflate without meaningful productivity gain, and lead time can drop simply because coding accelerated. These extensions help teams interpret DORA accurately in AI-assisted environments.
| Metric | Example | Description |
|---|---|---|
35% of merged code flagged as AI-assisted this sprint | • Segmentation layer for all other DORA metrics: what share of committed code was AI-generated? • without this, rising DF or falling LT is ambiguous — it could reflect AI inflation rather than pipeline improvement. | |
AI-assisted PRs: avg 3.2 day review vs human PRs: 1.8 day review | • If AI-assisted PRs take longer to review, the bottleneck has shifted from code creation to code review • total lead time may still fall while review quality and cycle time worsens. | |
8% of AI-generated code rewritten within 14 days | • Measures code durability that CFR cannot capture: code that deploys successfully but is silently replaced within weeks • pre-AI baseline ~3.3%; values above 7% indicate significant engineering waste. | |
Copilot acceptance rate: 32% today, was 44% three months ago | • Declining trend signals trust or relevance problems with AI output • 39% of developers trust AI outputs "a little" or "not at all" per 2025 DORA research. | |
Senior engineers reviewing 40 PRs/week, up from 18 pre-AI | • Leading indicator of burnout and review degradation in AI-assisted teams • AI creates more code than review capacity can absorb, concentrating burden on senior staff. | |
New feature work: 42% of time; bug/maintenance: 58% | • Ratio of time on new features vs reactive maintenance • if AI increases velocity but innovation rate declines, the team is faster on a treadmill, not delivering more value. |