BiScaled-DNN | Proceedings of the 56th Annual Design Automation Conference 2019 (2024)

research-article

Authors: Shubham Jain, Swagath Venkataramani, Vijayalakshmi Srinivasan, Jungwook Choi, Kailash Gopalakrishnan, and Leland Chang

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

June 2019

Article No.: 201, Pages 1 - 6

https://doi.org/10.1145/3316781.3317783

Published: 02 June 2019 Publication History

22citation
496
Downloads

Metrics

Total Citations22Total Downloads496

Last 12 Months95

Last 6 weeks11

Get Citation Alerts

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.

Manage my Alerts

New Citation Alert!

Please log in to your account

Get Access

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019
BiScaled-DNN: Quantizing Long-tailed Datastructures with Two Scale Factors for Deep Neural Networks
Pages 1 - 6
PREVIOUS ARTICLEFLightNNsPreviousNEXT ARTICLEA None-Sparse Inference Accelerator that Distills and Reuses the Computation Redundancy in CNNsNext
- Abstract
- References
- Get Access
- References
- Media
- Tables
- Share

Abstract

Fixed-point implementations (FxP) are prominently used to realize Deep Neural Networks (DNNs) efficiently on energy-constrained platforms. The choice of bit-width is often constrained by the ability of FxP to represent the entire range of numbers in the datastructure with sufficient resolution. At low bit-widths (< 8 bits), state-of-the-art DNNs invariably suffer a loss in classification accuracy due to quantization/saturation errors.

In this work, we leverage a key insight that almost all datastructures in DNNs are long-tailed i.e., a significant majority of the elements are small in magnitude, with a small fraction being orders of magnitude larger. We propose BiScaled-FxP, a new number representation which caters to the disparate range and resolution needs of long-tailed data-structures. The key idea is, whilst using the same number of bits to represent elements of both large and small magnitude, we employ two different scale factors viz. scale-fine and scale-wide in their quantization. Scale-fine allocates more fractional bits providing resolution for small numbers, while scale-wide favors covering the entire range of large numbers albeit at a coarser resolution. We develop a BiScaled DNN accelerator which computes on BiScaled-FxP tensors. A key challenge is to store the scale factor used in quantizing each element as computations that use operands quantized with different scale-factors need to scale their result. To minimize this overhead, we use a block sparse format to store only the indices of scale-wide elements, which are few in number. Also, we enhance the BiScaled-FxP processing elements with shifters to scale their output when operands to computations use different scale-factors. We develop a systematic methodology to identify the scale-fine and scale-wide factors for the weights and activations of any given DNN. Over 8 state-of-the-art image recognition benchmarks, BiScaled-FxP reduces 2 computation bits over conventional FxP, while also slightly improving classification accuracy on all cases. Compared to FxP8, the performance and energy benefits range between 1.43×-3.86× and 1.4×-3.7× respectively.

References

[1]

Z Lin et al. Neural Networks with Few Multiplications. CoRR, abs, 2015.

Google Scholar

[2]

F. Li and B. Liu. Ternary Weight Networks. CoRR, abs/1605.04711, 2016.

Google Scholar

[3]

S Zhou et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients. CoRR, abs/1606.06160, 2016.

Google Scholar

[4]

S. Venkataramani et al. AxNN: Energy-efficient neuromorphic systems using approximate computing. In Proc. ISLPED, Aug 2014.

Digital Library

Google Scholar

[5]

J. Choi et. al. PACT: parameterized clipping activation for quantized neural networks. CoRR, abs/1805.06085, 2018.

Cited By

View all

Guo CTang JHu WLeng JZhang CYang FLiu YGuo MZhu YSolihin YHeinrich M(2023)OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair QuantizationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589038(1-15)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589038
Chen XZhu JJiang JTsui C(2023)Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient ImplementationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317804742:2(644-657)Online publication date: Feb-2023
https://doi.org/10.1109/TCAD.2022.3178047
Liu CZhang D(2023)A selective quantization approach for optimizing quantized inference engine2023 11th International Conference on Information Systems and Computing Technology (ISCTech)10.1109/ISCTech60480.2023.00024(92-99)Online publication date: 30-Jul-2023
https://doi.org/10.1109/ISCTech60480.2023.00024
Show More Cited By

Recommendations

Compensated-DNN: energy efficient low-precision deep neural networks by compensating quantization errors
DAC '18: Proceedings of the 55th Annual Design Automation Conference
Deep Neural Networks (DNNs) represent the state-of-the-art in many Artificial Intelligence (AI) tasks involving images, videos, text, and natural language. Their ubiquitous adoption is limited by the high computation and storage requirements of DNNs, ...
Read More
A Fine-Grained Sparse Accelerator for Multi-Precision DNN
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
Neural Networks (NNs) have made a significant breakthrough in many fields, while they also pose a great challenge to hardware platforms since the state-of-the-art neural networks are both communicational- and computational-intensive. Researchers ...
Read More
Resource-constrained FPGA/DNN co-design
Abstract
Deep neural networks (DNNs) have demonstrated super performance in most learning tasks. However, a DNN typically contains a large number of parameters and operations, requiring a high-end processing platform for high-speed execution. To address ...
Read More

Comments

Information & Contributors

Information

Published In

DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019

June 2019

1378 pages

ISBN:9781450367257

DOI:10.1145/3316781

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [emailprotected]

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

DAC '19

Sponsor:

SIGDA

DAC '19: The 56th Annual Design Automation Conference 2019

June 2 - 6, 2019

NV, Las Vegas, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
496
Total Downloads

Downloads (Last 12 months)95
Downloads (Last 6 weeks)11

Other Metrics

View Author Metrics

Citations

Cited By

View all

Guo CTang JHu WLeng JZhang CYang FLiu YGuo MZhu YSolihin YHeinrich M(2023)OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair QuantizationProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589038(1-15)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589038
Chen XZhu JJiang JTsui C(2023)Tight Compression: Compressing CNN Through Fine-Grained Pruning and Weight Permutation for Efficient ImplementationIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317804742:2(644-657)Online publication date: Feb-2023
https://doi.org/10.1109/TCAD.2022.3178047
Liu CZhang D(2023)A selective quantization approach for optimizing quantized inference engine2023 11th International Conference on Information Systems and Computing Technology (ISCTech)10.1109/ISCTech60480.2023.00024(92-99)Online publication date: 30-Jul-2023
https://doi.org/10.1109/ISCTech60480.2023.00024
Ho NChang I(2023)O-2A: Outlier-Aware Compression for 8-bit Post-Training Quantization ModelIEEE Access10.1109/ACCESS.2023.331102711(95467-95480)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3311027
Tesfai HSaleh HAl-Qutayri MMohammad BStouraitis T(2023)Gradient Estimation for Ultra Low Precision POT and Additive POT QuantizationIEEE Access10.1109/ACCESS.2023.328629911(61264-61272)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3286299
Li TJiang HMo HHan JLiu LMao Z(2023)Approximate Processing Element Design and Analysis for the Implementation of CNN AcceleratorsJournal of Computer Science and Technology10.1007/s11390-023-2548-838:2(309-327)Online publication date: 30-Mar-2023
https://doi.org/10.1007/s11390-023-2548-8
Hanif MShafique M(2023)Cross-Layer Optimizations for Efficient Deep Learning Inference at the EdgeEmbedded Machine Learning for Cyber-Physical, IoT, and Edge Computing10.1007/978-3-031-39932-9_9(225-248)Online publication date: 10-Oct-2023
https://doi.org/10.1007/978-3-031-39932-9_9
Elangovan RJain SRaghunathan A(2022)Ax-BxP: Approximate Blocked Computation for Precision-reconfigurable Deep Neural Network AccelerationACM Transactions on Design Automation of Electronic Systems10.1145/349273327:3(1-20)Online publication date: 28-Jan-2022
https://dl.acm.org/doi/10.1145/3492733
Liu CZhang XZhang RLi LZhou SHuang DLi ZDu ZLiu SChen T(2022)Rethinking the Importance of Quantization Bias, Toward Full Low-Bit TrainingIEEE Transactions on Image Processing10.1109/TIP.2022.321677631(7006-7019)Online publication date: 2022
https://doi.org/10.1109/TIP.2022.3216776
Song SKim SPark GHan DYoo H(2022)A 49.5 mW Multi-Scale Linear Quantized Online Learning Processor for Real-Time Adaptive Object DetectionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2022.316016069:5(2443-2447)Online publication date: May-2022
https://doi.org/10.1109/TCSII.2022.3160160
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

BiScaled-DNN | Proceedings of the 56th Annual Design Automation Conference 2019 (2024)

New Citation Alert added!

New Citation Alert!

Abstract

References

Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

eReader

Media

Figures

Other

Tables