Efficient Deep Neural Network Computation on Processors
Add to Google Calendar
Deep neural networks (DNNs) have become a fundamental component of various applications. They are trained with a large amount of data to make accurate predictions. However, conventional DNN models have high computation and storage cost. This high cost makes it challenging to deploy DNN-based algorithms on existing processors. There are various algorithms proposed to reduce DNN computation, which can be divided into two main categories: network compression and domain transformation. Network compression removes the redundancy in DNN models by either pruning unimportant parameters or lowering the parameter precisions. It can reduce both the required computation and storage space. For domain transformation, convolution operations are converted into different domains and replaced with less computation. Nevertheless, these algorithms are designed without considering the characteristics of the underlying processors, which may lead to degradation in computation performance and even increase of the model size.
This thesis solves this challenge by customizing or combining the computation reduction algorithms for the processor architecture and augmenting the hardware to provide better support for DNN computation. The first part of this thesis proposes to customize DNN pruning to the underlying hardware by matching the pruned network structure to the parallel hardware organization. Besides pruning, I also investigate deploying low-precision models on microcontrollers. A new convolution algorithm is proposed to perform sub-byte multiply-accumulate computations with bitwise logic operations. Extra instruction set architecture (ISA) extensions are introduced to accelerate the computation further. The last part focuses on accelerating DNN computation by combining pruning techniques and Winograd convolution. A two-step pruning method, spatial-Winograd pruning, is proposed. Spatial-domain weights are first pruned in a structured way, transferring the spatial-domain sparsity into the Winograd domain efficiently. Remaining Winograd-domain weights are then pruned directly for a higher sparsity.