What is Reliability ?

Reliability in the broad sense is the science aimed at prediction, analyzing, preventing and mitigating failures over time. Reliability is quality over time. A reliable, trouble-free product continues to satisfy customers for a long time. Reliability in the narrow sense is the probability that a device will operate successfully for a specific period of time and under specified conditions when used in the manner and for the purpose intended. The probability of survival, R(t), plus the probability of failure, F(t), is always unity. Expressed as a formula : F(t) + R(t) = 1 or, F(t)=1 – R(t).

Reliability is not tangible, we cannot purchase it and add to a device separately after design. Reliability should be planned during the design of a product.

Why Reliability is So Important ?

It’s no surprise that customers consistently increasing their expectations for service and reliability of electronic products. Customers expect consistent operation from a purchased product. Reliability is always a top customer concern and is increasingly vocalized by customers as a major factor in purchasing decisions.

From good reliability, the product’s company gets benefited more through improved customer satisfaction, reduced warranty costs.

What is Availability ?

Availability is the probability that a system is in its useful working condition. Availability deals with the duration of up-time for operations and is a measure of how often the system is alive and well. It is often expressed as (up-time)/(up-time + downtime) with many different variants. Up-time refers to a capability to perform the task and downtime refers to not being able to perform the task.

What is Failure ?

Failure is any event that impacts a product’s working condition that affects adversely. If the product does not work as it was in sold-out condition, then it can be considered as a failure.

Field failures do not generally occur at a uniform rate, but follow a distribution in time commonly described as a “bathtub curve.” The life of a device can be divided into three regions: Infant Mortality Period, where the failure rate progressively improves; Useful Life Period, where the failure rate remains constant; and Wearout Period, where failure rates begin to increase.

Infant mortality refers to a phase of testing the products before selling to customer. During this phase, the units with latent defects will fail when exposed to stress. With the failure of the weak units, the remaining population is more reliable, and the failure rate is known to decrease.

Units that pass the Infant Mortality Period have a high probability of surviving. Failures that occur during the Useful Life Period are residual defects due to unpredictable system or environmental conditions, or premature wearout.

Wearout failures are generally associated with aging of electronic components. Typically, the wearout of a semiconductor occurs after many years or even decades, and outlives the lifespan of the system in which the component is used.

What is Maintainability ?

A measure of the ease and rapidity with which a system can be restored to operational status after a failure.

Maintainability deals with duration of maintenance outages or how long it takes to achieve (ease and speed) the maintenance actions compared to a datum. The datum includes maintenance (all actions necessary for retaining an item in, or restoring an item to, a specified, good condition) is performed by personnel having specified skill levels, using prescribed procedures and resources, at each prescribed level of maintenance. Maintainability characteristics are usually determined by equipment design which set maintenance procedures and determine the length of repair times.

What is Failure Mode ?

A particular way in which failures occur, independent of the reason for failure.

For example, electrical short circuiting is a failure mode, irrespective of reasons causing short circuit.

What is Early Life Period ?

The early life period of device operation is characterized by a rapidly declining failure rate. It occurs between 0 and 10,000 hours (~1 year) of device operation. It is usually expressed in percent failures per 1,000 hours.

What is Useful Life Period ?

Beyond the infant mortality period, in the useful life period, the failure rate is assumed to be determined by the exponential distribution. The failure rate here is at its lowest and relatively constant during this period. It begins after 10,000 hours (~1 year) of device operation. Reliability during this period must be specified as a single, essentially constant failure rate. An operating temperature of 55OC, activation energy of 0.62eV and normal operating voltage are used for lifetime and reliability calculations.

What is Failure Rate ?

The number of failures of an item per unit measurement of life. Failure rate is considered constant over the useful life period.

What is Failure Modes and Effects Analysis (FMEA) ?

A methodology to identify the modes of failure events and assigning values to them based on unit cost and frequency, then prioritizing the result in order to focus the organization on the significant few failures.

What is Failure Modes, Effects and Criticality Analysis (FMECA) ?

This the the detailed version of FMEA. Instead of examining the system as larger units, you assign criticality values of each failure for the smallest units in the system that is observed.

What is Mean Time Between Failures (MTBF) ?

Total operating time divided by the number of failures. MTBF is the inverse of failure rate.

What is Mean Time To Restore (MTTR) ?

Total elapsed time from initial failure to the reinitiating of system status. Mean Time To Restore includes Mean Time To Repair (MTBF + MTTR = 1.)

What is Root Cause Failure Analysis (RCFA) ?

A technique for uncovering the cause of a failure by deductive reasoning down to the physical and human root(s), and then using inductive reasoning to uncover the much broader latent or organizational root(s).

What is Mean Time To Failures (MTTF) ?

Total operating time before the failure of non-repairable components. MTBF is used to measure the reliability of repairable products, whereas MTTF is used to measure the reliability of non-repairable products.