Notes from the 2016 Linley Processor Conference

By Jim Harrison

The first day of the Linley Processor Conference in Santa
Clara covered a wide range of topics, including IoT security, 100-Gbit/s networks,
virtualization, and SoC connectivity.

In Linley Gwennap’s keynote, he explained that wafer cost is
rising rapidly for 28 nm and below and, in fact, transistor cost rose for the
first time ever at 20 nm. Cost-sensitive products are staying at 28 nm because there is no big price/transistor gain at smaller notes — while they may still
offer power and performance gains. He said that this is giving emphasis to specialized
architectures that can improve performance/watt by 10 to 100 times. Numerous examples
include the Cadence/Tensilica Xtensa LX7 and Microsoft Catapult.

Linley noted advancements in neural networks with Google’s ASIC
for TensorFlow and a new chip from Wave Computing being announced on Wednesday.
He also discussed ARM-based offerings
of x86 alternative in servers with Applied Micro X-Gene 1 and 2 and Cavium
ThunderX processors coming online and NXP now sampling QorIQ LS2 with up to
eight Cortex-A72 cores.

Fergus Casey, senior
R&D manager at Synopsys, discussed Protecting
IoT Edge Devices From Malicious Physical And Software Attacks
. These attacks
may come from communication links, indirect attacks via remote nodes, and software
attacks via malware, or privilege-level tampering. They may also originate as hardware
attacks — either non-invasive (debug ports, side channel) or invasive
(decapsulation, probing). Fergus noted that true random numbers are required
for secrecy and privacy and weak RNGs bring predictability and vulnerability. Synopsys’s
ARC processor uses the Trusted Execution Environment with memory and register
integrity protection, side channel protection, and more via their SecureShield

Iisko Lappalainen, senior
sales manager at MontaVista Software talked about Securing Edge Devices In The Latest Intelligent Networks. Software-defined networks (SDN) and network function virtualization (NFV) dynamically
adjust the network capabilities and quality of service, which makes for a
versatile system that may need specific security functions. MontaVista offers Linux
Integrity Measurement Architecture (IMA) and Extended Verification Module (EVM)
kernel facilities for this situation.

Marc Naddell, VP of
MediaTek, presented a talk titled IoT With
Deep Learning And Machine Learning Capabilities
. Mediatek offeres the LinkIt
7687 IoT development board for power-efficient IoT devices with secure Wi-Fi
connectivity and LinkIt 2523 for Bluetooth-connected wearables with fast and
accurate positioning. They also have the Helio X20 processor that provides
three clusters to handle different kinds of workloads simultaneously. It has 10 cores, including dual 2.5-GHz Cortex-A72’s and eight Cortex-A53’s, along with
the Coherent System Interconnect and AXI memory bus sub-systems. The chip has
an embedded sensor processor and a high-performance camera interface — plus
deep learning capability.

Jeff Defilippi, senior product manager at ARM, talked about Building More Powerful Infrastructure SoCs from edge to cloud. Arm
is introducing the CoreLink CMN-600 coherent mesh network IP and CoreLink
DMC-620 dynamic memory controller IP. The coherent mesh network function can
cut DDR4 interconnect+DMC static latency by 50% and yield 60% more bandwidth while
using the same chip area. It works with ARMv8-A processors to share data
between processors, accelerators, and I/O. The function uses AMBA 5 CHI
(Coherent Hub Interface) interfaces and has a system cache adjustable from 0 to
128 Mbytes.


DMC-620 dynamic
memory controller targets SoCs deployed in applications such as servers,
high-performance computing, and networking. The enterprise-class DDR3/4 memory
controller is said to offer the lowest memory latency 85−95% bandwidth utilization
with random traffic bandwidth. It also has integrated ARM TrustZone and SECDED
or symbol-based error correction.

Coherency: The New Normal in SoCs was the title of the talk given by Anush
Mohandass, VP at NetSpeed Systems.
Anush said the “new normal” autonomous driving and augmented reality mean an explosion in processing performance
is introducing the Gemini III processor IP for SoCs.


The Gemini can have
up to 64 cache-coherent clusters for both the processor and accelerator cores,
with deadlock detection and avoidance. It has distributed virtual memory
support and a DMA engine. Anush also said the “Gemini enables SoC architects to
implement designs that can achieve more than ten times greater performance in a
reasonable power envelope.” The design is architected for functional safety.

Matthew Mangan, corporate applications engineer at Arteris, spoke on Implementing Cache-Coherent Hardware Acceleration for ADAS and Machine Learning.
Arteris Ncore and FlexNoC interconnect IP connects coherent and non-coherent
SoC designs for reduced system latency and programming simplicity.

The X-Gene 64-bit ARM Server-on-a-Chip for Cloud and
Enterprise was discussed by Kumar Sankaran, associate vice president of software
and platform engineering at Applied Micro. X-Gene 3 yields six times the performance
versus current X-Gene family, with 1 Tbyte of memory per socket and 30% lower
power. Sampling Q1 2017, the chip has 32 ARM v8 64-bit CPU cores running at up
to 3 GHz, a GICv3 interrupt controller, and 42 lanes of PCIe Gen 3 with eight

Brian Thompto of IBM talked about the POWER9 — Processors for the
Cognitive Era
. The ICs have a new microarchitecture with a 120-Mbyte NUCA
L3, 12 x 20-way associative regions, advanced replacement policies, and 7
Tbytes/s on-chip bandwidth.


The chip uses a Nvidia NVLink 2.0 for high bandwidth and has
advanced CAPI 2.0 Coherent accelerator and storage attach (PCIe 4). It also features
25 Gbit/s interfaces. The device uses a 14-nm finFET process with 8.0 billion
transistors and will be available Q2 2017.

Developing Flexible, Scalable, Programmable
Network Elements With Customizable Search Engines
was the topic of a talk by Michael J.
Miller, VP of Technology at MoSys. The
PSE-S30 monolithic IC for search and offload integrates serial interface,
processors, and table memory. It has schedulers that manage external interface
and PE execution and user-defined algorithms for LPM, exact match, wildcard
match, and others. Commands can be functions, macro, or R/W. It offers 30-Gbit/s PHYs, a GigaChip Interface (GCI) Protocol, eight scheduling domains, 1 Gbit
of fast memory, threaded processor engines, 32 cores (8 clusters of 4 PE) and
comes in a FCBGA676 27 x 27-mm package.

Bart Stevens, VP of Product Management at INSIDE Secure, spoke on Layer
2 MACsec Security Solutions for 400 GE, FlexEthernet, and Beyond
. A complete “secure architecture” must protect
the data in process, protect access to data, protect data in transit, and
protect data at rest. Stevens said industry needs to start integrating security into network equipment. The
MACsec security standard (IEEE 802.1AE* & IEEE 802.1X) was designed to
provide port-based security across LANs. End-to-end WAN security deploys MACsec
across core networks and provider edge networks, virtual LAN connections
between campus and branches, and data center interconnects. INSIDE Secure is
introducing 400- and 500-Gbit/s security IP cores. The EIP-166 Multi-channel
MACsec integrates the classifier, the rate controller, and the frame transform
cores, plus buffers, etc.

Jay Walstrum, Senior
System Architect at Micron, talked on Graphic
Memory in Networking Designs.
Deep buffer top of rack switches in data center
networking often use 6 or 8 Gbytes of GDDR5, which is available in 8-Gbit chips
starting this year and will have 16 Gbit in a few years. GDDR5X bandwidth is six times that of DDR5.

Solutions For Network Services at the Edge was the topic of Sam Fuller, head of
solutions marketing at NXP Semiconductors. He said that virtual network functions
(VNF) will be distributed throughout the network — not just in servers, but in
premises, access point, and metro edge equipment. This will provide versatility
and aid security. NXP offers five communications processors with NVF functions,
topped by the LS2088A with eight 64-bit Cortex-A53 CPUs, eight 10-GigE ports, OVS Offload, 20 Gbits/s crypto, a packet-processing engine, and 30−45-W power consumption.


NFV and SDN Solutions for IoT and 5G
Intelligent Networks
was the
title of a talk by Kin-Yip Liu, senior director at Cavium. Their OCTEON TX
processor family has up to 48 custom ARMv8 cores. The upcoming ThunderX2
processor will have up to 54 custom ARM cores, up to six DDR4 memory controllers,
improved RAS features, 10/25/40/50/100G Ethernet ports, next generation accelerators,
SATAv3, and PCIe Gen3.