SAE - JA1003
Software Reliability Program Implementation Guide
|Publication Date:||1 January 2004|
This document provides methods and techniques for implementing a reliability program throughout the full life cycle of a software product, whether the product is considered as standalone or part of a system. This document is the companion to the Software Reliability Program Standard [JA1002]. The Standard describes the requirements of a software reliability program to define, meet, and demonstrate assurance of software product reliability using a Plan-Case framework and implemented within the context of a system application.
This document has general applicability to all sectors of industry and
commerce and to all types of
equipment whose functionality is to some degree implemented by
software components. It is intended
to be guidance for business purposes and should be applied when it
provides a value-added basis for
the business aspects of development, use, and sustainment of software
whose reliability is an
important performance parameter. Applicability of specific practices
will depend on the
Following guidelines in this document does not guarantee required reliability will be achieved, or that any certification authority will accept the results as sufficient evidence that requisite reliability has been achieved. Following guidelines in this document will provide insight into what level of reliability has been achieved. With proper customer, certification authority, and supplier negotiation and interaction in accordance with these guidelines, it is more likely that the achieved reliability will be acceptable.
The target audience for this document includes customer organizations, certification authorities, specialty reliability engineers, and software developers that acquire, develop, use, or provide post-delivery operation of or support for software.
The guidance in this document can be applied to all software-intensive projects, and in particular to projects where the reliability of the software is critical to the performance of the system mission. System applications include military, aerospace, transportation, medical, nuclear industries, ground vehicles, and other consumer applications. Such systems may include the integration of custom software as well as Off-The-Shelf (OTS) software. Custom software is generally newly developed software or a significant rework/upgrade of existing software that is for use with a specific application. OTS software sources include commercial vendors, government, and industry. The guidance in this document is generally applicable throughout the complete life cycle, although specific approaches may be more effectively applied at specific life cycle points depending on the software source, application, and pedigree.
Software is a major component of most important system applications. Because the software component typically provides critical functions, faults in the software may cause the system to fail in a significant way. Such system failures due to direct cause software faults are what we classify as "software failures". Thus, it is important to use methods and techniques that provide evidence that the software component has been designed, implemented, tested, installed, and, as necessary, updated without faults that might result in undesirable system failures.
The topic of software reliability is concerned with all life cycle activities that prevent, detect, remove, and/or mitigate software faults, and that verify/validate the degree to which software faults do not exist and will not cause system failures. Software reliability is (quantitatively) defined as the probability of failure-free operation of a software program for a specified time under specified conditions. However, having a "number", even with the appropriate accompanying evidence, is not generally sufficient to convince customers, regulatory authorities, or even the system/software suppliers that the software satisfies its requirements. Thus, software reliability is also (qualitatively) defined as a set of attributes that bear on the capability of software to maintain its level of performance under stated conditions for a stated period of time. Attributes that relate to implementation of fault tolerance design, use of best practice engineering practices, application of specialized methods and techniques for ensuring safety- and/or security-critical requirements, and procedural methods to ensure mistake-proof loading and/or operation also provide evidence that improves the confidence that the software will not cause a system failure.
There are similarities between hardware and software failures and also differences. Software failures are primarily the result of design defects (during development or maintenance). Other failure sources include use-induced degradation as well as inadequate operational procedures and logistics operations documentation that is considered part of the "software data package". Hardware failures are primarily the result of physical wear out. Other failure sources include design defects, manufacturing quality deficiency, or maintenance or operating errors. Some system failures are the result of a combination of hardware and software faults. It is generally easier to implement changes to software than to hardware, although any component change must be part of a system support concept that includes continued reliability analysis. Hardware is generally repaired to an original state, unless there is a reason to modify it. Software can frequently be returned to its original state by re-initializing, and often is corrected, enhanced, and adapted so as to become a new version, that is, a new product.
Both hardware and software must be managed as an integrated system. The reliability of the system will depend on the reliability of the hardware and software as an integrated whole. Some techniques to manage the system reliability will be similarly applied to hardware and software components whereas other techniques will be unique to hardware or to software. In addition, the application of a given technique may be different for software than for hardware.
There are no existing methods that guarantee delivered software has no faults. That is, there is always some likelihood that under certain environmental conditions and system operational use, faults in software will be encountered that result in failures of the system. In short, software reliability is not "1.0". There are existing methods and techniques that correlate with delivery of software with reduced faults/failures. It is desirable to provide sufficient quantitative and qualitative evidence that appropriate development and support activities have been conducted to prevent, detect, remove, and/or mitigate possible software faults, particularly those faults that might result in critical system failures.
How might faults be prevented, detected, removed, and/or mitigated in
the software development
and/or support activities? What techniques might be used to provide
quantitative or qualitative
evidence that faults capable of causing a system failure do not exist
in the software component?
Given limited resources and time, which combination of techniques
provides the "optimum"
cost/benefit results? How are decisions made to select such techniques
and how is the evidence from
the use of such techniques collected and presented? It is these
concerns for which this document
provides some guidelines - both for management of a software
reliability program and conduct of
life cycle activities using appropriate software engineering and
The reference by Littlewood [LITTLEWD00] describes some of the challenges of providing evidence that can support pre-operational claims for reliability. "Particularly when high levels of reliability need to be assured, it will be necessary to use several sources of evidence to support reliability claims. Combining such disparate evidence to aid decision making is itself a difficult task and a topic of current research." Four areas of evidence are discussed in terms of benefits and limitations:
1. evidence from software components and structure;
2. evidence from static analysis of the software product;
3. evidence from testing of software under operational conditions; and
4. evidence of process quality.
Among the challenges to provide software reliability assurance, there are cultural issues in addition to hard technical research questions to be investigated.
The guidelines in this document recommend determining, meeting, and demonstrating the assurance of customer requirements with an integrated and agreed-upon set of activities and measures within a system context, using a plan-case management framework. It is the hope that these guidelines will be a basis for promoting a systematic approach to the assurance of software reliability through direct attention to the cultural issues of negotiation, implementation agreement, and human interface as well as to the hard technical research necessary to demonstrate progress in understanding this complex area.
Roadmap to Document Guidance
Each reader of this guide may have different interests. A quick roadmap summary of sections of this guideline that might support reader interests is contained in Table 1.
This guide is not intended to be a novel read from front to back. The life cycle management information described in Section 4 provides an overview of a software reliability program across the various life cycle phases, including example methods/techniques that might support the program in each phase. The task activities described in Section 5 directly support implementation of the software reliability program standard [JA1002] requirements. If the reader is interested in considerations for tailoring a program, addressing safety/security, integrating Off-The-Shelf software, or data collection then Section 6 would be the place to find such information.
The relationship of this software reliability guideline document to many existing standards and guidelines documents is presented as a matrix in Appendix A. Software reliability plan and case outlines are illustrated in Appendix B. If there is interest in a wide variety of potential methods and techniques, then Appendix C would be the place to look. It is emphasized that there are numerous ways to combine and integrate methods and techniques different from those described in Appendix C. There are undoubtedly excellent methods and techniques that exist and are not included in this guide or that are developed after publication of this guide. Numerous tools exist to support the methods and techniques, but this guide does not specifically discuss any of those tools since their capabilities change so rapidly. Such tools can be identified through the many references. Two case studies (at least fragments) are covered in Appendix D and Appendix E. A fairly comprehensive glossary of acronyms and definitions is contained in Section 3 and primary references from SAE, related standards and guidelines, and publications of interest are contained in Section 2. It is also noted that many other references are contained in Appendix C as part of each specific method/technique.