Pruning Technique With Different Framework
Mar 13, 2020
The iterative pruning technique is a concept that goes back to Yan Le Cun’s work in 1990 or even earlier. The concept is simple, each neural network has redundant parameters, which has less contribution to the output. By removing the neurons and synapses with little impact, networks will become smaller and faster.
To prune a neural network, it is just like playing Jenga . Removing one block by itself is easy, but the most interesting part is how to stop the tower from collapsing. The Xilinx pruning technique derived from Song’s Paper has been cited over 2000 times today. The three-stage pipeline is still considered as the state of art pruning technique that achieves a 35x on Alexnet and 49x on VGG16 without losing accuracy.
A faculty member from U.C. Berkeley’s Vision and Learning Center (BVLC) “Spend any amount of time researching the topic of deep learning and you'll inevitably come across the term Caffe.” As the implementation of operators in Caffe framework is quite standard, the Caffe based Pruning tool is easily adaptable from one network to another.
There are four steps in a typical pruning workflow in Caffe framework. Sensitive Analysis is the most critical part of the entire workflow, which will rank the neurons and synapses layer by layer. In other words, the analysis strategy will greatly determine the performance of the pruned network. After analysis, a pruning rate needs to be set up to enter the next step.
$ ./decent_p ana –config config.prototxt
The second is Pruning. The pruning rate simply defines how much you would like to cut off. The parameters will be set to zero according to the rank list until the pruning rate is achieved. A summary table will then show how the accuracy, parameters, and operations change after pruning.
$ ./decent_p prune –config config.prototxt
The third step is to Finetune. After pruning, the accuracy will drop and need finetuning to recover. The network might not be recovered if it is pruned too much. This is the so called ‘iterative pruning’ which repeats the pruning and finetune steps several times with gradually increased pruning rate.
$ ./decent_p finetune –config config.prototxt
Here is a sample for configuration file modification:
workspace: "examples/decent_p/"
gpu: "0,1,2,3"
test_iter: 100
acc_name: "top-1"
model: "examples/decent_p/float.prototxt"
#weights: "examples/decent_p/float.caffemodel"
weights: "examples/decent_p/regular_rate_0.1/_iter_10000.caffemodel"
solver: "examples/decent_p/solver.prototxt"
# change rate from 0.1 to 0.2
#rate: 0.1
rate: 0.2
pruner {
method: REGULAR
}
The last step is to Transform. Until now, the model has been put on a mask with the low-rank parameters to zero. The workload remains the same before this transform. This final step will do the real cut off and transform it into a lighter model.
$ ./decent_p transform –model baseline.prototxt –weights finetuned_model.caffemodel
Darknet, as a C based framework with little dependencies draws lots of attention today. And the Yolo network should play a significant role for its popularity. Compared to Caffe, Darknet has some differences for the implementation of operators like MaxPool. However, the pruning tool for the Darknet framework has no major change. It has a four-step workflow as well. In the pruning step, the threshold parameter needs to be set instead of pruning rate which could be the largest difference here. The definitions of the two parameters are listed below:
Pruning rate – How much workload you would like to cut off after pruning.
Threshold – How much accuracy loss you could bear for each layer.
The threshold is proportional to the pruning rate but just from the accuracy prospect of view.
TensorFlow is one of the worlds leading machine learning framework today for the rich programming APIs, distributed training strategy and flexible design platforms. The pruning tool for TensorFlow framework released by Xilinx still follows the classic four-steps workflow but may have some constraint for the input graph or the APIs. The input graph needs to be in the form of .pbtxt and .chpt for the graph and weights with no Keras API inside.
To help the users better understand the pruning tools with different frameworks, the input and output of each step could be summarized as the table below:
Caffe | Darknet | TensorFlow | ||||
---|---|---|---|---|---|---|
Input | Output | Input | Output | Input | Output | |
Ana |
|
NA |
|
|
|
NA |
Prune |
|
|
|
|
|
|
Fineture |
|
|
|
|
|
|
Transform |
|
|
|
|
|
|
To summarize, Xilinx has applied a classic and unified four-steps pruning workflow on Caffe, Darknet and TensorFlow frameworks. The unified pruning concept will make the tool easy to use and have no extra cost when having two networks designed with different frameworks.