When the software controlling a dangerous system suffers a glitch, you’ll need the right type of processor to avoid a potentially fatal failure.

Editor’s Note: Welcome to AspenCore’s Special
Project on the safety of autonomous vehicles. This article, along with the
articles listed on the last page, form an in-depth look from a variety of
angles at the business and technology of autonomous vehicle safety. 

 

 

By Richard
Quinnell, Special Projects editor

Many system
designs, including industrial machinery, medical devices, and automobiles, are
safety-critical and need to have an ability to detect their own operational
failures in real-time and react in a way to avoid harming the people using
them. Creating a processor-based system to provide this functional safety thus requires
using a combination of hardware error-checking, hardware self-test, and system
redundancy to provide the software-independent fault detection and safe
resolution these systems need. Fortunately, there are processors available that
handle much of the hardware heavy lifting needed for safety critical systems.

The need for
functional safety in processor-based systems is rising, especially in
automotive applications. Even setting aside the whole movement toward
autonomous vehicles, automobiles are increasingly reliant on microprocessors in
implementing critical functions. Anti-lock braking systems, engine control, and
steering are simply a few of the vehicle functions now under processor control
that have major safety implications. Should any of these processors make even a
single misstep without being caught, the results could be fatal.

Unfortunately,
the opportunities for something to go wrong in a processor-based design are
legion. As the diagram below shows, proper code execution requires many system
elements to work correctly. The processor and all its internal registers, the
program and cache memories, the RAM, and the bus interfaces among them, along
with the system power and clocks, all must operate flawlessly with precision
timing. But as anyone who has had their computer lock up for no apparent reason
knows, a single bit change anywhere in this system can derail the entire
operation. A noise glitch on any line of the bus, a stray alpha particle or
cosmic ray strike (yes, they do happen, and more often than one might think) that
alters a bit in memory or a register, low voltage, clock drift, and a host of
other sources can cause the system to stumble.

safety processors - fig 1 basic-processing-unit

The core of processor-based systems offers
many opportunities for noise glitches and other single event upsets to
completely derail proper software execution.

Such errors
can be made unlikely through careful design, but not eliminated. For a system
to be deemed safe, then, it must be able to detect such an error in real time
and respond appropriately to mitigate its effects. What constitutes proper
mitigation is highly application dependent, but the methods for detecting an
error are well-established and common to safety critical designs. Transactions
on the system bus, for instance, can be monitored by including error correction
coding (ECC) or cyclic redundancy check (CRC) data
with each transaction. Voltage monitors can keep tabs on power sources, and
watchdog timers can help monitor clock signals.

A watchdog
timer can also provide a gross indication of proper processor operation by
having the processor reset the timer on a regular basis. If the processor fails
in that duty, the watchdog sends a signal to alert the system to the failure
once the timer has run out. This involves making a tradeoff between the
software overhead of frequent timer resets and the delay in signaling processor
failure, however.

Yet, detecting
a failure is only one part of functional safety. The other part is responding
to the failure in a way that maintains safe system operation. This response
cannot be entirely software based. You cannot count on being able to use a
processor that has failed to mitigate its own problems or even react to the
alerts. There must be an independent hardware mechanism in place.

A variety of architectures have
evolved over the years to provide such an independent mechanism in
processor-based systems. These architectures include the use of a single
processor with hardware checker and the use of two processors with the second
processor of the same or different type as the main unit. This second processor
can operate independently, running the same or independent software, serving as
a touchstone to validate the main processor’s behavior on a cycle-by-cycle
basis. The more popular alternative, though, is for the second processor to run
in lockstep with the main unit, using the same code and data. However, the
secondary processor will typically work on a slight delay from the primary, to
avoid having both processors affected by a transient error on the system bus.

 

safety processors - fig 2 safety processor architectures

A variety of architectures have been
developed that support the detection and mitigation of random processing
errors. (Source: EE Times)

What these
architectures have in common is a need to make substantial additions to the
basic processor design, including comparison hardware and possibly a full
secondary processor. The advent of multi-core processors opened an opportunity for
silicon vendors to offload much of this hardware design burden from system
developers, and many have stepped up to the plate by introducing processors
specifically designed for safety-critical applications. Many of these safety
processors are marketed primarily to automotive designers working under the ISO 26262 standard for ASIL (automotive safety integrity level)
certification, but are equally applicable to other safety-critical applications
in industrial control, medical, military, and aerospace.

These
providers go further than simply providing hardware features. They also offer
designers assistance in implementing safe designs, traceability and
verification documentation and development tools in support of obtaining safety
certification, and diagnostic software libraries.

Here are some
representative safety processor families currently on the market:

  • ARM Cortex R52: Part of ARM’s v8-R architecture, the
    R52 core gives ARM licensees the foundation features needed to implement a
    safety processor. The dual-core device can operate in lockstep mode for fault
    detection and has the option of an additional split configuration that allows
    the two cores to operate independently when needed. The core design also
    includes ECC on all bus and memory interfaces, capable of double-bit error
    detection and single-bit error correction. In addition, the core also offers high-coverage
    built-in self-test (BIST) capability and a licensable safety package to
    simplify product safety implementation.

  • Intel Xeon D-1529: Instead of targeting automotive
    applications, Intel’s D-1529 aims to meet industrial needs under IEC
    61508
    safety integration level (SIL) certification
    standards. The design includes redundant lockstep processor pairs, windowed
    watchdog timers, clock and power monitors, and processor temperature
    monitoring. The processors can support mixed safety-critical and non-critical
    task execution and offers diagnostic and error-detection logic on its PCI and
    SATA interfaces.

  • MIPS i6500-F core: This core design allows MIPS
    licensees to create safety processors based on configurable clusters of 64-bit
    CPUs. It includes parity checks on all buses, ECC on its RAM, and logic BIST
    support. It has been certified as a safety element out of context (SEooC) to
    ASIL level B, supporting designs aiming for certification as ASIL level D.

  • NXP S32S24: Targeting ASIL-D designs, the S32S247
    uses four ARM R-52 lockstep cores with a hardware hypervisor to keep
    application program execution separate. The large (to 64 Mbytes) integrated
    Flash memory allows the processor to hold multiple sets of application code in
    support of over-the-air software updates, and all memory interfaces include
    ECC.

  • STMicro SPC5:
    The SPC5 product line includes several variations, including lockstep, delayed
    lockstep, and decoupled parallel processing options. Processors include BIST
    hardware with the SPC57S line additionally offering ECC on memory.

  • Texas Instruments Hercules: The Hercules family of safety
    processors have been certified compliant under IEC-61508 SIL level 3 and
    ISO-26262 ASIL level D using lockstep Cortex-R processors. In addition, they
    offer ECC on system memory, ECC or parity on select peripheral and DMA
    interfaces, CRC or parity on serial and network communications peripherals,
    on-chip clock and voltage monitoring, IO loopback and ADC self-test, and memory
    BIST. The error signaling module offers an external signal pin to facilitate
    additional system response to errors detected within the processor.

  • Xilinx Zynq 7000: While it is not actually a processor,
    the Zynq FPGA can be configured to provide two independent safety channels in a
    single device using design packages, methodologies, and tools certified for use
    in functional safety applications. The tools include support for isolated
    design flows that physically separate the redundant elements to prevent the use
    of FPGA resources, and the availability of soft error mitigation IP.

Choosing a
safety processor is only the beginning, however. Developers of safety-critical
systems will still need to adopt a design and evaluation methodology for both
hardware and software that rigorously evaluates the potential for errors to
occur and validates the system design for resilience to such errors.
Safety-targeted processors and the support their vendors provide, though, go a
long way toward easing that developer burden.

Check
out these other stories in the safety of autonomous vehicles Special Project:

Autonomous vehicles: The electronics road to making
them safe

Explore tools and
technologies available to make AVs safe, including pedestrian path prediction,
functional safety, cameras/lidars/radars, and V2X.

How Are We Going to Monitor Drivers?
Euro NCAP wants Driver
Monitoring Systems (DMS) as a primary safety standard by 2020. Meanwhile,
recent Uber and Tesla crashes substantially heightened the importance of DMS.
Now car OEMs are scrambling.

Uber Fatality Sends AVs Back to Safety 101

An NTSB preliminary report exposes two issues. One is the immaturity of
Uber’s AV software stack. Another is the absence of an Uber safety strategy in
creating its AV testing platform.

 

Robocar Testing: It’s Simulation, Stupid!
Why do you need simulation?
It’s because you can still miss the ground truth even with millions of miles.