research-article
Authors: Shubham Jain, Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi, Kailash Gopalakrishnan, and Leland Chang
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
June 2019
Article No.: 201, Pages 1 - 6
Published: 02 June 2019 Publication History
- 22citation
- 496
- Downloads
Metrics
Total Citations22Total Downloads496Last 12 Months95
Last 6 weeks11
New Citation Alert added!
This alert has been successfully added and will be sent to:
You will be notified whenever a record that you have chosen has been cited.
To manage your alert preferences, click on the button below.
Manage my Alerts
New Citation Alert!
Please log in to your account
Get Access
- Get Access
- References
- Media
- Tables
- Share
Abstract
Fixed-point implementations (FxP) are prominently used to realize Deep Neural Networks (DNNs) efficiently on energy-constrained platforms. The choice of bit-width is often constrained by the ability of FxP to represent the entire range of numbers in the datastructure with sufficient resolution. At low bit-widths (< 8 bits), state-of-the-art DNNs invariably suffer a loss in classification accuracy due to quantization/saturation errors.
In this work, we leverage a key insight that almost all datastructures in DNNs are long-tailed i.e., a significant majority of the elements are small in magnitude, with a small fraction being orders of magnitude larger. We propose BiScaled-FxP, a new number representation which caters to the disparate range and resolution needs of long-tailed data-structures. The key idea is, whilst using the same number of bits to represent elements of both large and small magnitude, we employ two different scale factors viz. scale-fine and scale-wide in their quantization. Scale-fine allocates more fractional bits providing resolution for small numbers, while scale-wide favors covering the entire range of large numbers albeit at a coarser resolution. We develop a BiScaled DNN accelerator which computes on BiScaled-FxP tensors. A key challenge is to store the scale factor used in quantizing each element as computations that use operands quantized with different scale-factors need to scale their result. To minimize this overhead, we use a block sparse format to store only the indices of scale-wide elements, which are few in number. Also, we enhance the BiScaled-FxP processing elements with shifters to scale their output when operands to computations use different scale-factors. We develop a systematic methodology to identify the scale-fine and scale-wide factors for the weights and activations of any given DNN. Over 8 state-of-the-art image recognition benchmarks, BiScaled-FxP reduces 2 computation bits over conventional FxP, while also slightly improving classification accuracy on all cases. Compared to FxP8, the performance and energy benefits range between 1.43×-3.86× and 1.4×-3.7× respectively.
References
[1]
Z Lin et al. Neural Networks with Few Multiplications. CoRR, abs, 2015.
[2]
F. Li and B. Liu. Ternary Weight Networks. CoRR, abs/1605.04711, 2016.
[3]
S Zhou et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. CoRR, abs/1606.06160, 2016.
[4]
S. Venkataramani et al. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proc. ISLPED, Aug 2014.
Digital Library
[5]
J. Choi et. al. PACT: parameterized clipping activation for quantized neural networks. CoRR, abs/1805.06085, 2018.
[6]
Asit K. Mishra, Eriko Nurvitadhi, Jeffrey J. Cook, and Debbie Marr. WRPN: wide reduced-precision networks. CoRR, abs/1709.01134, 2017.
[7]
H. Tann et al. Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks. In Proc. DAC, 2017.
Digital Library
[8]
E. Park et. al. Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In Proc. ISCA, June 2018.
Digital Library
[9]
S. Jain et. al. Compensated-dnn: Energy efficient low-precision deep neural networks by compensating quantization errors. In Proc. DAC, 2018.
Digital Library
[10]
Olga Russakovsky et. ali. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211--252, 2015.
Digital Library
[11]
M. Courbariaux et al. BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. CoRR, abs/1602.02830, 2016.
[12]
M Rastegari et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks. CoRR, abs/1603.05279, 2016.
[13]
P. Judd et al. Proteus: Exploiting numerical precision variability in deep neural networks. In Proc. ICS, June. 2016.
Digital Library
[14]
S. Hashemi et al. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In Proc. DATE, March 2017.
Digital Library
[15]
S. Han et al. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. CoRR, abs, 2015.
[16]
S. Venkataramani et al. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. In Proc. ISCA, June, 2017.
Digital Library
[17]
B. Reagen et al. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proc. ISCA, June 2016.
Digital Library
[18]
B. Fleischer et. al. A scalable multi-teraops deep learning processor core for ai training and inference. In Proc. VLSI Symposium, June 2018.
[19]
S Sen et al. Sparce: Sparsity aware general purpose core extensions to accelerate deep neural networks. CoRR, abs/1711.06315, 2017.
[20]
Y Jia et al. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093, 2014.
[21]
S. Venkataramani et. al. Poster: Design space exploration for performance optimization of deep neural networks on shared memory accelerators. In Proc. PACT, 2017.
Cited By
View all
- Guo CTang JHu WLeng JZhang CYang FLiu YGuo MZhu YSolihin YHeinrich M(2023)OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair QuantizationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589038(1-15)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589038
- Chen XZhu JJiang JTsui C(2023)Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient ImplementationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317804742:2(644-657)Online publication date: Feb-2023
- Liu CZhang D(2023)A selective quantization approach for optimizing quantized inference engine2023 11th International Conference on Information Systems and Computing Technology (ISCTech)10.1109/ISCTech60480.2023.00024(92-99)Online publication date: 30-Jul-2023
- Show More Cited By
Recommendations
- Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors
DAC '18: Proceedings of the 55th Annual Design Automation Conference
Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, ...
Read More
- A Fine-Grained Sparse Accelerator for Multi-Precision DNN
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Neural Networks (NNs) have made a significant breakthrough in many fields, while they also pose a great challenge to hardware platforms since the state-of-the-art neural networks are both communicational- and computational-intensive. Researchers ...
Read More
- Resource-constrained FPGA/DNN co-design
Abstract
Deep neural networks (DNNs) have demonstrated super performance in most learning tasks. However, a DNN typically contains a large number of parameters and operations, requiring a high-end processing platform for high-speed execution. To address ...
Read More
Comments
Information & Contributors
Information
Published In
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
June 2019
1378 pages
ISBN:9781450367257
DOI:10.1145/3316781
Copyright © 2019 ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [emailprotected]
Sponsors
- SIGDA: ACM Special Interest Group on Design Automation
- IEEE-CEDA
In-Cooperation
- SIGBED: ACM Special Interest Group on Embedded Systems
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 02 June 2019
Permissions
Request permissions for this article.
Check for updates
Qualifiers
- Research-article
- Research
- Refereed limited
Conference
DAC '19
Sponsor:
- SIGDA
Acceptance Rates
Overall Acceptance Rate 1,770 of 5,499 submissions, 32%
Contributors
Other Metrics
View Article Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- View Citations
22
Total Citations
496
Total Downloads
- Downloads (Last 12 months)95
- Downloads (Last 6 weeks)11
Other Metrics
View Author Metrics
Citations
Cited By
View all
- Guo CTang JHu WLeng JZhang CYang FLiu YGuo MZhu YSolihin YHeinrich M(2023)OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair QuantizationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589038(1-15)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589038
- Chen XZhu JJiang JTsui C(2023)Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient ImplementationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317804742:2(644-657)Online publication date: Feb-2023
- Liu CZhang D(2023)A selective quantization approach for optimizing quantized inference engine2023 11th International Conference on Information Systems and Computing Technology (ISCTech)10.1109/ISCTech60480.2023.00024(92-99)Online publication date: 30-Jul-2023
- Ho NChang I(2023)O-2A: Outlier-Aware Compression for 8-bit Post-Training Quantization ModelIEEE Access10.1109/ACCESS.2023.331102711(95467-95480)Online publication date: 2023
- Tesfai HSaleh HAl-Qutayri MMohammad BStouraitis T(2023)Gradient Estimation for Ultra Low Precision POT and Additive POT QuantizationIEEE Access10.1109/ACCESS.2023.328629911(61264-61272)Online publication date: 2023
- Li TJiang HMo HHan JLiu LMao Z(2023)Approximate Processing Element Design and Analysis for the Implementation of CNN AcceleratorsJournal of Computer Science and Technology10.1007/s11390-023-2548-838:2(309-327)Online publication date: 30-Mar-2023
- Hanif MShafique M(2023)Cross-Layer Optimizations for Efficient Deep Learning Inference at the EdgeEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-39932-9_9(225-248)Online publication date: 10-Oct-2023
- Elangovan RJain SRaghunathan A(2022)Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network AccelerationACM Transactions on Design Automation of Electronic Systems10.1145/349273327:3(1-20)Online publication date: 28-Jan-2022
https://dl.acm.org/doi/10.1145/3492733
- Liu CZhang XZhang RLi LZhou SHuang DLi ZDu ZLiu SChen T(2022)Rethinking the Importance of Quantization Bias, Toward Full Low-Bit TrainingIEEE Transactions on Image Processing10.1109/TIP.2022.321677631(7006-7019)Online publication date: 2022
- Song SKim SPark GHan DYoo H(2022)A 49.5 mW Multi-Scale Linear Quantized Online Learning Processor for Real-Time Adaptive Object DetectionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.316016069:5(2443-2447)Online publication date: May-2022
- Show More Cited By
View Options
Get Access
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Sign in
Full Access
Get this Publication
View options
View or Download as a PDF file.
PDFeReader
View online with eReader.
eReaderMedia
Figures
Other
Tables