The saga of neutrons and alpha particles — and how to deal with them — is finally reaching the consumer chip’s doorstep.
The issue of alien subatomic particles , i.e., radiation from space, potentially affecting chips is again making rounds in the trade media amid a recent paper from Vanderbilt University’s Bharat Bhuva with a new focus on consumer devices.
The paper underscores the impact of radiation from high-energy neutrons and alpha particles on electronic circuitry.
“The high-energy particles can damage the structure of oxides and can create impurities in silicon, changing transistor characteristics such as threshold voltage,” acknowledged Kevin Krewell, principal analyst at Tirias Research.
But industry observers like Krewell are also quick to point out that the unintended changes caused by neutrons and alpha particles in chips have been long known. For instance, when a high-energy neutron strikes a silicon atom, it leads to the release of heavy ions that create momentary current pulses. And that causes flip flops or data change in memory cells.
Next, radioactive isotopes in molding compounds found in chip packages can generate alpha particles, which, in turn, leads to a malfunction commonly known as the single-event upset (SEU). There are two common types of problems that alien subatomic particles can cause in chips: configuration memory errors and soft errors.
First, neutrons and alpha particles can cause upsets in memory elements when they are able to corrupt interconnecting elements used for routing and configuration of logic elements. Second, when flip-flop or memory cells change state due to neutron-induced radiation effects, it leads to a malfunction commonly termed as the soft error, or data error.
Fig. 1: Microsemi’s FPGAs employ data
protection techniques like ECC to mitigate soft errors.
Kurt Shuler, VP Marketing at Arteris, affirms that the relationship between problems caused by subatomic particle strike and semiconductor process technologies is well understood. “Functional safety engineers can now calculate expected soft error rates.”
He said that the process of dealing with errors caused by the upsets in memory elements and flipping of a digital bit is well-described in the ISO 26262 specification for automotive chips and the IEC 61508 standard for all programmable electronics. “There are specifications for other vertical markets that recommend similar processes,” Shuler added.
How to handle soft errors
Let’s see how design engineers usually deal with the inevitable SEUs. The hardware approach (Fig. 1) is based on protecting memories and internal data communications with techniques such as error-correcting code (ECC) and triple-module redundancy (TMR).
Another way is to duplicate critical processing elements and compare the results in a redundancy mode. “This is called Dual Core Lockstep (DCLS) and processors that can easily implement it include the ARM Cortex-R processors and the Synopsys ARC EM processors with the Safety Enhancement Package,” Shuler said.
Fig. 2: The DCLC technology implements
two identical processors with a slight delay. Image source: ARM.
Shuler also quoted the example of mission-critical safety systems like National Highway Traffic Safety Administration (NHTSA) Level 5 recommendations for fully autonomous vehicles. “It’s going to be implemented in triplicate and connected to some kind of voting logic like a Kalman filter,” he added. “That is to determine the most probable result of what is ‘reality’ at any point in time.”
Shuler, a former Air Force pilot, also acknowledges that such duplication techniques implemented in automotive and military avionics are too costly for consumer electronics. And that inevitably leads to renewed efforts for creating economic solutions to address soft errors caused by highly charged particles.
It also partly answers the questions about why the issue of radiation effects on semiconductor devices has resurfaced now. After all, the semiconductor industry has long been dealing with this issue in life-critical and safety-critical applications.
That includes automotive electronics, aviation, industrial automation, and medical devices. And then there are high-availability and revenue-critical environments like communication infrastructure, which have long been dealing with the upsets caused by subatomic particles.
Apparently, the renewed efforts on dealing with alien particles are going to focus on consumer realms such as smartphones and notebook computers. Case in point is the alliance that includes top consumer chipmakers like AMD, Broadcom, MediaTek, Qualcomm, and Renesas, as well as key semiconductor industry players such as ARM, Synopsys, and TSMC.
The alliance is most likely focused on finding new solutions for handling soft errors in consumer chips. Especially the ones that are created at smaller nodes such as 16 nm.
Managing memory vulnerability
Memory silicon — both external and embedded memory — embodies the worst-case scenario when it comes to the impact of alien subatomic particles. Memory devices like DRAMs are especially vulnerable because memory-bit cells can be discharged by subatomic particle strike, leading to an SEU or a soft error.
A single soft data bit error in video traffic, for example, can lead to an imperceptible video glitch. And a single soft data bit error in OS instruction could result in a system crash. Will Strauss, principal analyst at Forward Concepts, recalls that nearly 30 years ago, employing epitaxial silicon wafers, or epi wafers, was believed to lessen the particle’s effect on memory drop-outs.
Fig. 3: Strauss: It’s a known problem in the semiconductor industry.
“However, with smaller geometries like 16 nm, there is a greater chance that a particle will hit a specific memory location,” Strauss added. “That’s why it’s a greater concern now.
“Tirias’ Krewell says that shielding chips isn’t much of an option because it adds weight and thickness to electronic devices. “So chipmakers are adding ECC to all data and instruction paths to detect and possibly correct errors.”
Some chipmakers are even employing the ECC technology to protect cache memory in CPU operations. Then there are techniques like memory sparing that allow the replacement of failing memory with additional DRAM on the fly.
Image source: Vanderbilt University