Learn what makes a good DevOps metric, and discover six metrics most organizations can use to evaluate the performance of DevOps teams.
What are DevOps Metrics?
DevOps transformation requires organizations to invest a lot of time, money, and resources, revisiting everything from communication and training to tools. The ability to clearly and accurately assess DevOps metrics and performance benchmarks is critical to defining goals, improving efficiency, and tracking success.
The choice of key performance indicators for a DevOps initiative depends on the specific challenges and requirements of the company. DevOps KPIs should provide a comprehensive view of business value and impact of the transformation. The right performance metrics can evaluate the value of existing work done, and guide future process and technology decisions.
Characteristics of Useful DevOps Metrics
Here are five characteristics of a good DevOps indicator, which can help you provide insights about the progress of a DevOps initiative or the performance of DevOps teams:
- Measurable—metrics must have standardized values that are consistent over time.
- Relevant—metrics should measure aspects that are important to the business.
- Reliable—team members cannot affect or “game” the measurement.
- Actionable—long-term analysis of the metric should provide insights into possible improvements in systems, workflows, strategies, etc.
- Traceable—the metrics should point directly to a root cause, not just allude to a general problem.
Don’t track DevOps metrics that are:
- Based on non-DevOps values—for example, metrics that measure adherence to requirements are more suited for a waterfall development environment.
- Based on competition—if the best performers are the “winners” and everyone else “loses”, it is difficult to expect communication and collaboration within and between teams. Don’t build metrics based on competition between team members or between teams (e.g. number of failed builds or fatal errors). Teams will become obsessed with improving the metric, rather than discovering real problems and working together to resolve them.
- Vanity metrics—metrics must support teamwork. Vanity metrics indicate some capability, but are not really indicative of business effectiveness. For example, the number of lines of code written each week is irrelevant because code can disappear completely during refactoring, and sometimes less code is better for the organization. The number of builds per day doesn’t matter, unless each build really adds value to the end user experience.
6 Key DevOps Metrics
The following six metrics can be important for measuring DevOps performance and progress in most organizations.
1. Lead Time
The time it takes to implement, test, and deliver code. To measure delivery time, the team must clearly define the start and end of the work (e.g. measurable time from code commitment to production deployment). The goal is to speed up deployment through automation and reduce overall deployment time, for example by optimizing test integration and automation.
2. Deployment Frequency
The number of software deployments over a period of time. It can be measured in a variety of ways, including automated deployment pipelines, API calls, and manual scripts.
This metric has to do with technical performance of the deployment pipeline, not frequency of delivery, because not all deployments are pushed to production. However, more frequent deployments can reduce errors associated with failed deployments, which affect overall customer satisfaction.
3. Change Failure Rate
Improving velocity seems to be one of the ultimate goals of a DevOps initiative, but it should be assessed along with failure rates. Frequent failures of changes that are deployed to production can ultimately lead to unsatisfied customers.
If KPIs show a higher rate of failure as deployments increase, it’s time to slow down and investigate issues in the development and deployment pipeline.
4. Mean Time to Recovery (MTTR)
In DevOps metrics, this indicator tracks how long it will take the organization to recover from failure. It is a key business indicator because it reflects the ability to minimize disruption and recover normal operations quickly. It is usually measured in minutes or hours, and can sometimes refer to time during business days, not clock time.
To reduce MTTR, it is important to have the right application monitoring tools, as well as effective collaboration between operations and developers, which can help you find root causes and deploy solutions quickly.
5. Customer Ticket Volume
This metric is a measure of end user satisfaction. As mentioned earlier, bugs and errors can often bypass the testing phase and be detected by the end user. Customers will then contact support and share their feedback.
Therefore, the number of customer tickets marked as problems or bugs are an important indicator of application reliability. A large number of tickets indicates quality issues, while a small number indicates robustness of the application.
6. Defect Escape Rate
Even with a great DevOps pipeline, defects will occur. In some cases, these defects may be detected during development or testing phases of the pipeline. But in the worst case, they will pass tests and be detected by end users.
The defect escape rate reflects the number of defects found in production during and after deployment. It identifies cracks in the software development process—defects slide through these cracks and indicates that the quality process should be optimized and tightened.
DevOps programs can deliver huge benefits to organizations, but are complex and expensive to implement. DevOps metrics are needed to understand how DevOps teams are performing, and whether the effort to implement DevOps is really paying off. I explained how to select good metrics for a DevOps initiative, and covered six metrics that can be useful for most organizations:
- Lead time—the time needed to push new changes to production
- Deployment frequency—how often builds are deployed to an environment
- Change failure rate—how many changes result in defects
- MTTR—time required to recover from failure
- Customer tickets—how many problems are filed by customers as support tickets
- Defect escape rate—how many quality issues make their way to production