February 17, 2023

Reading Time:

Table of Contents

Measure MTTD and MTTR for Your Application

What Are MTTD and MTTR?

MTTD (Mean Time To Detect) and MTTR (Mean Time To Resolve) are two important metrics used to measure the performance of an organization’s incident management process.

MTTD is the mean time an organization takes to detect the existence of issues from their first occurrences. If an organization runs into 12 system failures in a week, the MTTD is calculated as the average time taken to detect the issue across those 12 failures.

MTTR is the mean time an organization takes to permanently fix a failure in the system since the problem was reported. This includes the time taken to detect, diagnose, repair, and permanently fix the failure so that it doesn’t recur.

Both MTTR and MTTD can be impacted by a number of factors, including the complexity of the issue, the skills of the incident response team, and the availability of resources in terms of both infrastructure and manpower.

Why Do They Matter?

Before we dive straight into determining MTTD and MTTR, let’s first take a look at why they matter to an organization. First and foremost, the downtime of an SaaS application can impact multiple key factors such as user experience, customer trust, an organization’s credibility, reliability, and cost.

According to a recent survey by ITIC, the average cost of just one hour of downtime for a large enterprise can amount to over $300,000. This includes the direct costs associated with lost productivity and revenue as well as indirect costs such as damage to reputation and brand value.

According to Gartner, the average cost of IT downtime is $5,600 per minute. Based on the type of business and operations the average cost of downtime can vary between $140,000 per hour and $540,000 per hour at the higher end.

To minimize the cost of downtime, enterprises need to invest in reliable hardware and software. Moreover, they should have a good plan in place for how to recover from any disruptions. This is where MTTD and MTTR play an important role. The ability to detect and permanently resolve issues early is a KPI that all enterprise software organizations should keenly monitor.

How To Measure MTTD and MTTR

Formula for measuring MTTD:

sum(time when the issue was detected - time when the issue occurred) / total number of issues during the interval (week, month, quarter)

Formula for measuring MTTR:

sum(time the issue was fixed permanently - time when the issue occurred) / total number of issues during the interval (week, month, quarter)

Organizations can also use a number of tools and techniques to measure MTTR and MTTD, such as logging system data, tracking incident tickets, and conducting post-incident reviews. Using robust observability stacks also allows teams to gain more visibility into the environment and context within which the system is running.

For example, as part of an investigation, looking into the logs, metrics, and traces will accurately provide the time at which the issue originally occurred. If there is a ticketing, on-call alerting system in place, it can provide the first time the issue was reported. Additionally, adopting a black box testing approach to constantly monitor user-facing systems can also help to accurately measure your MTTD and MTTR.

How To Improve MTTR and MTTD

As mentioned before, organizations use these metrics to track the efficiency of their incident management processes and compare them to industry benchmarks.

If you’re looking to improve your organization’s MTTR and MTTD, you should start by looking at the DORA metrics. The DORA, or DevOps Research and Assessment, metrics are a set of metrics that measure an organization’s performance in four key areas:

Delivery: How quickly can we get features into production?
Innovation: How quickly can we experiment and learn?
Optimization: How well are we able to keep our systems running?
Security: How well are we protecting our systems and data?

Organizations that perform well in all four areas are considered “elite performers“. In fact, the DORA metrics are often used to assess an organization’s progress on their journey to “elite” status.

So, how can you improve your organization’s MTTR and MTTD? Here are a few ideas:

Make sure you have a clear understanding of your organization’s DORA metric scores. This will help you identify areas of improvement.
Focus on improving your delivery process. The faster you can get features into production, the faster you can identify and fix issues.
Make sure you foster a culture of experimentation and learning. Encourage your team to experiment and to learn from its mistakes.
Improve your monitoring and logging protocols. This will help you identify issues faster and minimize the impact of outages.

If you found this article helpful, stay tuned for the next one in this series: Tools You Can Use To Help Improve Your MTTD and MTTR.

OpsVerse ObserveNow is a managed, battle-tested, scalable observability platform built on top of open source software and open standards. ObserveNow can be deployed as a cloud service or as a private SaaS installation within your own cloud (AWS/GCS/Azure). If you’d like to explore OpsVerse ObserveNow, click here to start a free trial today!

Written by Nikhil Ravindran

Table of Contents

Categories

Agentic AI in DevOps: Why Aiden Beats ChatGPT for DevOps Cost Analysis

Here are the 5 critical considerations for selecting a copilot that delivers accurate, secure, and cost-effective automation.

Nikhil Ravindran | Apr 29, 2025 | 4 min read

Top 5 Things to Keep in Mind When Choosing an Agentic AI Based DevOps Copilot

Here are the 5 critical considerations for selecting a copilot that delivers accurate, secure, and cost-effective automation.

Nikhil Ravindran | Mar 14, 2025 | 5 min read

NoOps and the Future of DevOps: How GenAI is Eliminating the Tools Tax

This blog explores how generative AI can eliminate "operational toil" and free up developers to focus on innovation.

Arul Jegadish Francis | Dec 19, 2024 | 4 min read

The Rise of DevOps Copilots: A New Era of Automation and Intelligence

As the demands of modern software development continue to grow, so too does the complexity of DevOps. To keep pace,...

Nikhil Ravindran | Dec 3, 2024 | 5 min read

The Hidden DevOps Problem: Why Tools Alone Aren’t Enough

This blog uncovers the core issues that tools can’t solve, explores how innovative approaches like Generative AI, can bridge these gaps.

Nikhil Ravindran | Nov 21, 2024 | 4 min read

Overview of Aiden’s Tasks Feature

Discover how Aiden, the world's first DevOps Copilot, can revolutionize your DevOps workflows. This session provides an in-depth look at Aiden's capabilities and demonstrates how it simplifies complex DevOps processes.

Udaya Kumar | Jul 9, 2025 | 1 min read

Self Service DevOps Provision DigitalOcean Environments in Seconds with Aiden

Udaya Kumar | May 20, 2025 | 1 min read

Aiden Vs ChatGPT K8s Resource Usage

Udaya Kumar | May 16, 2025 | 1 min read

Webinar – MCP x DevOps

MCP x DevOps Ready to experience the future of DevOps? Start For...

Udaya Kumar | May 8, 2025 | 1 min read

Aiden Vs Chat GPT AWS Cost Analysis Use Case

In this video, watch how Aiden

Detects the vulnerability

Tells you if you're affected

And even raises a pull request to fix it — automatically.

Udaya Kumar | Apr 21, 2025 | 1 min read

Aiden by OpsVerse

ObserveNow

Download Product Brief

OpsVerse Blogs

Videos

Community

Webinars

Release Notes

Latest Blog Posts

The Company

About Us

Investors

Follow Us

In The News

Press Releases

Connect With Us

Contact Us

Subscribe To Our Newsletter >>