Heterogeneous Computing Elements: A Quantitative Evaluation

Maya Gokhale | 19-ERD-004

Project Overview

The reduced impact of Moore's Law and end of Dennard scaling has led to an explosion of special purpose acceleration architectures that gain performance by utilizing narrow, purpose-built circuits tailored to high payoff applications such as deep learning. Although attention in the commercial sector focuses on acceleration architectures for neural net workloads, opportunities for incorporating heterogeneous architectures likewise exist in scientific and national security applications. Architectures exhibiting heterogeneity and specialization appear to offer promise to the seemingly conflicting goals of energy reduction and high performance. New architectural features may be incorporated at a wide range of granularities, including co-processors, memory/storage hierarchy innovations including processing near or in memory, and processing in transit in the interconnect or near sensor.

In this project we designed, prototyped, and evaluated new hardware Intellectual Property (IP) blocks suitable for insertion into multiple computing fabric locations including near memories, network interfaces, and instrument. To address the specialized needs of scientific computing, we created hardware units to encode and decode floating point scientific data arrays in the Lawrence Livermore National Laboratory (LLNL) "zfp" format. We prototyped machine learning algorithms on an embedded near sensor processor to classify laser induced spectroscopy breakdown measurements. In collaboration with academic partners, we built an open source translator to compile programs written in the System C hardware description language to System Verilog. The modules and tools developed in this project enabled us to create and quantitatively evaluate heterogeneous function units of high impact to the scientific and data science communities. Our System C translator provides a vital link in a completely open source tool flow to design and evaluate new heterogeneous IP. To better quantify system level performance of near memory IP blocks, we enhanced the Logic in Memory Emulator (LiME) with a new variable latency delay model. Our research on near memory heterogeneous processors has been incorporated into several vendors' future technology experiments. Due to this research, we were invited to contribute to DOE Advanced Computing Workshops and subsequent RFPs, initiated new projects from DOD to advance design and evaluation of heterogeneous function blocks, and participate in a DARPA program studying near memory computing.

Mission Impact

This project directly addressed the strategic thrust of advancing new computing paradigms beyond the exascale to exploit emerging heterogeneity in devices, architectures, and systems. This project enhanced Lawrence Livermore National Laboratory's core competencies in high-performance computing, simulation, and data science.

Publications, Presentations, and Patents

Gokhale, M. "FPGAs in High Performance Computing." IEEE International Symposium on Field Programmable Custom Computing Machines (FCCM), May 2021.

Gokhale, M. "Near memory HPC accelerator design with FPGAs." Computing Frontiers, May 2021.

Gokhale, M. "Rapid System Level Design and Evaluation of Near Memory Fixed Function Units: A Reconfigurable Computing Application." SC20 Workshop on High Performance Reconfigurable Computing, November 2020.

Gokhale, M. "Advocating search-driven advances in memory." Memory Systems panel, SRC Spring 2021 Tech Forum, May 2021.

Gokhale, M. "Microscope on Memory: FPGA Acceleration of Computer Memory System Assessments." Workshop on Heterogeneous Accelerators, ETH Zurich, September 2019.

Bhardwaj, K. and Gokhale, M. "Semi-Supervised On-Device Neural Network Adaptation for Remote and Portable Laser-Induced Breakdown Spectroscopy," Workshop on On-Device Intelligence at MLSys 2020.

Jain, A. et al. "Performance Assessment of Emerging Memories through FPGA Emulation," IEEE Micro, Jan/Feb 2019.