ATIS 0100030
Mean Time Between Outages – A Generalized Metric for Assessing Production Failure Rates in Telecommunications Network Elements
| Organization: | ATIS |
| Publication Date: | 1 August 2012 |
| Status: | active |
| Page Count: | 13 |
scope:
Telecommunications Service Providers (SPs) face the challenge of needing to continuously upgrade the network and grow network capacity, while providing a service that meets stringent customer reliability expectations. While telecommunications companies have significant experience providing reliable telephone service, the challenge for an SP is more difficult because changes in Internet technology -- particularly router software -- are significantly more frequent and less rigorously tested than was the case in circuit-switched telephone networks. SPs cannot wait until the technology matures - a large SP has to meet high reliability requirements for critical applications like financial transactions, Voice over IP (VoIP), Internet Protocol Television (IPTV), streaming video, telepresence, and on-line gaming using commercially available technology. The most critical driver for high reliability requirements is that these applications are very sensitive to short interruptions (~1 second) that arise from component glitches with self-restoration. Such outages are different from hardware failures which require component replacement whose frequency is captured in the traditional Mean Time Between Failure (MTBF) reliability metric.
An initial examination of the inability of the MTBF metric to adequately address short duration outages was undertaken in [ATIS-0100025] and the first publication of ATIS-01000301. The focus of this effort was on the SP edge router, which is recognized as the key element of modern day Internet-based SP networks. Routers comprise a wide range of components such as line, control, and switching cards, as well as power supplies and cooling units - typically from multiple equipment suppliers -- leading to the possibility of several types of failures with different customer impacts. The Mean Time Between Outage (MTBO) metric was introduced as a practical method to characterize the impact of all outages, including short duration outages, by defining MTBO in terms of failure frequency of Customer Facing Line Cards. This document generalizes the MTBO metric definition as an industry standard applicable for any type of network element and provides additional illustrative examples for metric development and assessment for the following:
- Set of Software Controlled Devices (power amplifiers in the UMTS nodeB)
- Ethernet Virtual Connections (eVC)
- Radio Network Controller (RNC)
1 This version of ATIS-0100030 replaces ATIS-0100030.2010, Mean Time Between Outages - A Metric for Assessing Production Failure Rates in IP Routers. The historical version of ATIS-0100030.2010 can be accessed by sending a completed ATIS Research Request Form, which is located at < http://www.atis.org/
Document History