BiScaled-DNN | Proceedings of the 56th Annual Design Automation Conference 2019 (2024)

research-article

Authors: Shubham Jain, Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi, Kailash Gopalakrishnan, and Leland Chang

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

June 2019

Article No.: 201, Pages 1 - 6

Published: 02 June 2019 Publication History

  • 22citation
  • 496
  • Downloads

Metrics

Total Citations22Total Downloads496

Last 12 Months95

Last 6 weeks11

  • Get Citation Alerts

    New Citation Alert added!

    This alert has been successfully added and will be sent to:

    You will be notified whenever a record that you have chosen has been cited.

    To manage your alert preferences, click on the button below.

    Manage my Alerts

    New Citation Alert!

    Please log in to your account

  • Get Access

      • Get Access
      • References
      • Media
      • Tables
      • Share

    Abstract

    Fixed-point implementations (FxP) are prominently used to realize Deep Neural Networks (DNNs) efficiently on energy-constrained platforms. The choice of bit-width is often constrained by the ability of FxP to represent the entire range of numbers in the datastructure with sufficient resolution. At low bit-widths (< 8 bits), state-of-the-art DNNs invariably suffer a loss in classification accuracy due to quantization/saturation errors.

    In this work, we leverage a key insight that almost all datastructures in DNNs are long-tailed i.e., a significant majority of the elements are small in magnitude, with a small fraction being orders of magnitude larger. We propose BiScaled-FxP, a new number representation which caters to the disparate range and resolution needs of long-tailed data-structures. The key idea is, whilst using the same number of bits to represent elements of both large and small magnitude, we employ two different scale factors viz. scale-fine and scale-wide in their quantization. Scale-fine allocates more fractional bits providing resolution for small numbers, while scale-wide favors covering the entire range of large numbers albeit at a coarser resolution. We develop a BiScaled DNN accelerator which computes on BiScaled-FxP tensors. A key challenge is to store the scale factor used in quantizing each element as computations that use operands quantized with different scale-factors need to scale their result. To minimize this overhead, we use a block sparse format to store only the indices of scale-wide elements, which are few in number. Also, we enhance the BiScaled-FxP processing elements with shifters to scale their output when operands to computations use different scale-factors. We develop a systematic methodology to identify the scale-fine and scale-wide factors for the weights and activations of any given DNN. Over 8 state-of-the-art image recognition benchmarks, BiScaled-FxP reduces 2 computation bits over conventional FxP, while also slightly improving classification accuracy on all cases. Compared to FxP8, the performance and energy benefits range between 1.43×-3.86× and 1.4×-3.7× respectively.

    References

    [1]

    Z Lin et al. Neural Networks with Few Multiplications. CoRR, abs, 2015.

    [2]

    F. Li and B. Liu. Ternary Weight Networks. CoRR, abs/1605.04711, 2016.

    [3]

    S Zhou et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. CoRR, abs/1606.06160, 2016.

    [4]

    S. Venkataramani et al. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proc. ISLPED, Aug 2014.

    Digital Library

    [6]

    Asit K. Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. WRPN: wide reduced-precision networks. CoRR, abs/1709.01134, 2017.

    [7]

    H. Tann et al. Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks. In Proc. DAC, 2017.

    Digital Library

    [8]

    E. Park et. al. Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In Proc. ISCA, June 2018.

    Digital Library

    [9]

    S. Jain et. al. Compensated-dnn: Energy efficient low-precision deep neural networks by compensating quantization errors. In Proc. DAC, 2018.

    Digital Library

    [10]

    Olga Russakovsky et. ali. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211--252, 2015.

    Digital Library

    [11]

    M. Courbariaux et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR, abs/1602.02830, 2016.

    [12]

    M Rastegari et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR, abs/1603.05279, 2016.

    [13]

    P. Judd et al. Proteus: Exploiting numerical precision variability in deep neural networks. In Proc. ICS, June. 2016.

    Digital Library

    [14]

    S. Hashemi et al. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In Proc. DATE, March 2017.

    Digital Library

    [15]

    S. Han et al. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs, 2015.

    [16]

    S. Venkataramani et al. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proc. ISCA, June, 2017.

    Digital Library

    [17]

    B. Reagen et al. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proc. ISCA, June 2016.

    Digital Library

    [18]

    B. Fleischer et. al. A scalable multi-teraops deep learning processor core for ai training and inference. In Proc. VLSI Symposium, June 2018.

    [19]

    S Sen et al. Sparce: Sparsity aware general purpose core extensions to accelerate deep neural networks. CoRR, abs/1711.06315, 2017.

    [20]

    Y Jia et al. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093, 2014.

    [21]

    S. Venkataramani et. al. Poster: Design space exploration for performance optimization of deep neural networks on shared memory accelerators. In Proc. PACT, 2017.

    Cited By

    View all

    • Guo CTang JHu WLeng JZhang CYang FLiu YGuo MZhu YSolihin YHeinrich M(2023)OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair QuantizationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589038(1-15)Online publication date: 17-Jun-2023

      https://dl.acm.org/doi/10.1145/3579371.3589038

    • Chen XZhu JJiang JTsui C(2023)Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient ImplementationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317804742:2(644-657)Online publication date: Feb-2023
    • Liu CZhang D(2023)A selective quantization approach for optimizing quantized inference engine2023 11th International Conference on Information Systems and Computing Technology (ISCTech)10.1109/ISCTech60480.2023.00024(92-99)Online publication date: 30-Jul-2023
    • Show More Cited By

    Recommendations

    • Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors

      DAC '18: Proceedings of the 55th Annual Design Automation Conference

      Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, ...

      Read More

    • A Fine-Grained Sparse Accelerator for Multi-Precision DNN

      FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

      Neural Networks (NNs) have made a significant breakthrough in many fields, while they also pose a great challenge to hardware platforms since the state-of-the-art neural networks are both communicational- and computational-intensive. Researchers ...

      Read More

    • Resource-constrained FPGA/DNN co-design

      Abstract

      Deep neural networks (DNNs) have demonstrated super performance in most learning tasks. However, a DNN typically contains a large number of parameters and operations, requiring a high-end processing platform for high-speed execution. To address ...

      Read More

    Comments

    Information & Contributors

    Information

    Published In

    BiScaled-DNN | Proceedings of the 56th Annual Design Automation Conference 2019 (7)

    DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

    June 2019

    1378 pages

    ISBN:9781450367257

    DOI:10.1145/3316781

    Copyright © 2019 ACM.

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [emailprotected]

    Sponsors

    • SIGDA: ACM Special Interest Group on Design Automation
    • IEEE-CEDA

    In-Cooperation

    • SIGBED: ACM Special Interest Group on Embedded Systems

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 02 June 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    DAC '19

    Sponsor:

    • SIGDA

    Acceptance Rates

    Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

    Upcoming Conference

    DAC '25

    • Sponsor:
    • sigda

    62nd ACM/IEEE Design Automation Conference

    June 22 - 26, 2025

    San Francisco , CA , USA

    Contributors

    BiScaled-DNN | Proceedings of the 56th Annual Design Automation Conference 2019 (8)

    Other Metrics

    View Article Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 22

      Total Citations

      View Citations
    • 496

      Total Downloads

    • Downloads (Last 12 months)95
    • Downloads (Last 6 weeks)11

    Other Metrics

    View Author Metrics

    Citations

    Cited By

    View all

    • Guo CTang JHu WLeng JZhang CYang FLiu YGuo MZhu YSolihin YHeinrich M(2023)OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair QuantizationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589038(1-15)Online publication date: 17-Jun-2023

      https://dl.acm.org/doi/10.1145/3579371.3589038

    • Chen XZhu JJiang JTsui C(2023)Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient ImplementationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317804742:2(644-657)Online publication date: Feb-2023
    • Liu CZhang D(2023)A selective quantization approach for optimizing quantized inference engine2023 11th International Conference on Information Systems and Computing Technology (ISCTech)10.1109/ISCTech60480.2023.00024(92-99)Online publication date: 30-Jul-2023
    • Ho NChang I(2023)O-2A: Outlier-Aware Compression for 8-bit Post-Training Quantization ModelIEEE Access10.1109/ACCESS.2023.331102711(95467-95480)Online publication date: 2023
    • Tesfai HSaleh HAl-Qutayri MMohammad BStouraitis T(2023)Gradient Estimation for Ultra Low Precision POT and Additive POT QuantizationIEEE Access10.1109/ACCESS.2023.328629911(61264-61272)Online publication date: 2023
    • Li TJiang HMo HHan JLiu LMao Z(2023)Approximate Processing Element Design and Analysis for the Implementation of CNN AcceleratorsJournal of Computer Science and Technology10.1007/s11390-023-2548-838:2(309-327)Online publication date: 30-Mar-2023
    • Hanif MShafique M(2023)Cross-Layer Optimizations for Efficient Deep Learning Inference at the EdgeEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-39932-9_9(225-248)Online publication date: 10-Oct-2023
    • Elangovan RJain SRaghunathan A(2022)Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network AccelerationACM Transactions on Design Automation of Electronic Systems10.1145/349273327:3(1-20)Online publication date: 28-Jan-2022

      https://dl.acm.org/doi/10.1145/3492733

    • Liu CZhang XZhang RLi LZhou SHuang DLi ZDu ZLiu SChen T(2022)Rethinking the Importance of Quantization Bias, Toward Full Low-Bit TrainingIEEE Transactions on Image Processing10.1109/TIP.2022.321677631(7006-7019)Online publication date: 2022
    • Song SKim SPark GHan DYoo H(2022)A 49.5 mW Multi-Scale Linear Quantized Online Learning Processor for Real-Time Adaptive Object DetectionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.316016069:5(2443-2447)Online publication date: May-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    Get this Publication

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    BiScaled-DNN | Proceedings of the 56th Annual Design Automation Conference 2019 (2024)
    Top Articles
    Latest Posts
    Article information

    Author: Stevie Stamm

    Last Updated:

    Views: 6195

    Rating: 5 / 5 (60 voted)

    Reviews: 83% of readers found this page helpful

    Author information

    Name: Stevie Stamm

    Birthday: 1996-06-22

    Address: Apt. 419 4200 Sipes Estate, East Delmerview, WY 05617

    Phone: +342332224300

    Job: Future Advertising Analyst

    Hobby: Leather crafting, Puzzles, Leather crafting, scrapbook, Urban exploration, Cabaret, Skateboarding

    Introduction: My name is Stevie Stamm, I am a colorful, sparkling, splendid, vast, open, hilarious, tender person who loves writing and wants to share my knowledge and understanding with you.