Pruning Technique With Different Framework

Mar 13, 2020

The iterative pruning technique is a concept that goes back to Yan Le Cun’s work in 1990 or even earlier. The concept is simple, each neural network has redundant parameters, which has less contribution to the output. By removing the neurons and synapses with little impact, networks will become smaller and faster.

To prune a neural network, it is just like playing Jenga . Removing one block by itself is easy, but the most interesting part is how to stop the tower from collapsing. The Xilinx pruning technique derived from Song’s Paper has been cited over 2000 times today. The three-stage pipeline is still considered as the state of art pruning technique that achieves a 35x on Alexnet and 49x on VGG16 without losing accuracy.

A faculty member from U.C. Berkeley’s Vision and Learning Center (BVLC) “Spend any amount of time researching the topic of deep learning and you'll inevitably come across the term Caffe.” As the implementation of operators in Caffe framework is quite standard, the Caffe based Pruning tool is easily adaptable from one network to another.

There are four steps in a typical pruning workflow in Caffe framework. Sensitive Analysis is the most critical part of the entire workflow, which will rank the neurons and synapses layer by layer. In other words, the analysis strategy will greatly determine the performance of the pruned network. After analysis, a pruning rate needs to be set up to enter the next step.

$ ./decent_p ana –config config.prototxt

The second is Pruning. The pruning rate simply defines how much you would like to cut off. The parameters will be set to zero according to the rank list until the pruning rate is achieved. A summary table will then show how the accuracy, parameters, and operations change after pruning.

$ ./decent_p prune –config config.prototxt

The third step is to Finetune. After pruning, the accuracy will drop and need finetuning to recover. The network might not be recovered if it is pruned too much. This is the so called ‘iterative pruning’ which repeats the pruning and finetune steps several times with gradually increased pruning rate.

$ ./decent_p finetune –config config.prototxt

Here is a sample for configuration file modification:

    workspace: "examples/decent_p/"
gpu: "0,1,2,3"
test_iter: 100
acc_name: "top-1"

model: "examples/decent_p/float.prototxt"
#weights: "examples/decent_p/float.caffemodel"
weights: "examples/decent_p/regular_rate_0.1/_iter_10000.caffemodel"
solver: "examples/decent_p/solver.prototxt"

# change rate from 0.1 to 0.2
#rate: 0.1
rate: 0.2
pruner {
  method: REGULAR
}

The last step is to Transform. Until now, the model has been put on a mask with the low-rank parameters to zero. The workload remains the same before this transform. This final step will do the real cut off and transform it into a lighter model.

$ ./decent_p transform –model baseline.prototxt –weights finetuned_model.caffemodel

Darknet, as a C based framework with little dependencies draws lots of attention today. And the Yolo network should play a significant role for its popularity. Compared to Caffe, Darknet has some differences for the implementation of operators like MaxPool. However, the pruning tool for the Darknet framework has no major change. It has a four-step workflow as well. In the pruning step, the threshold parameter needs to be set instead of pruning rate which could be the largest difference here. The definitions of the two parameters are listed below:

Pruning rate – How much workload you would like to cut off after pruning.

Threshold – How much accuracy loss you could bear for each layer.

The threshold is proportional to the pruning rate but just from the accuracy prospect of view.

TensorFlow is one of the worlds leading machine learning framework today for the rich programming APIs, distributed training strategy and flexible design platforms. The pruning tool for TensorFlow framework released by Xilinx still follows the classic four-steps workflow but may have some constraint for the input graph or the APIs. The input graph needs to be in the form of .pbtxt and .chpt for the graph and weights with no Keras API inside.

To help the users better understand the pruning tools with different frameworks, the input and output of each step could be summarized as the table below:

	Caffe		Darknet		TensorFlow
	Input	Output	Input	Output	Input	Output
Ana	baseline.prototxt baseline.caffemodel config.prototxt	NA	baseline.cfg baseline.weights config.cfg	ana.out	origin.pbtxt origin.ckpt	NA
Prune	baseline.prototxt baseline.caffemodel config.prototxt	prune.prototxt prune.caffemodel	baseline.cfg baseline.weights config.cfg	prune.cfg Compress.out	baseline.pbtxt baseline.ckpt	prune.pbtxt prune.ckpt
Fineture	baseline.prototxt prune.caffemodel solver.prototxt	finetune.caffemodel	prune.cfg compress.out config-prune.cfg	finetune.weights	baseline.pbtxt prune.ckpt	finetune.ckpt
Transform	baseline.prototxt finetune.caffemodel	transform.caffemodel	prune.cfg finetune.weights	transform.cfg transform.weights	baseline.pbtxt finetune.ckpt	transform.ckpt

To summarize, Xilinx has applied a classic and unified four-steps pruning workflow on Caffe, Darknet and TensorFlow frameworks. The unified pruning concept will make the tool easy to use and have no extra cost when having two networks designed with different frameworks.

Article By

Fan Zhang

Rechenzentrum

Business-Systeme

Personal Computing und Gaming

Embedded

Ressourcen

GPU-Beschleuniger

Adaptive Beschleuniger

DPU-Beschleuniger

Ethernet-Adapter

Workstations

Desktops

Notebooks

Ressourcen

FPGAs und adaptive SoCs

Systemmodule (SOMs)

Technologien

Ressourcen für Entwickler

Probeplatinen und Bausätze

Prozessor-Tools

Grafik-Tools und -Apps

Tools für FPGAs und adaptive SoCs

Urheberrechte und Apps

Tools und Apps für GPU-Beschleuniger

Übersicht

Für Rechenzentren und die Cloud

Für Edge und Endpunkte

Für Entwickler

Branchen

Branchen

Branchen

Branchen

Industrias

Einsatzbereiche

Gaming

Systeme

Technologien

Ressourcen

EPYC Prozessoren

Radeon GPUs und AMD Chipsätze

FPGAs und adaptive SoCs

Alveo-Beschleuniger & Kria-SOMs

Ryzen Prozessoren

Ethernet-Adapter

Übersicht

Prozessoren

Beschleuniger

Adaptive SoCs, FPGAs und SOMs

Grafikprodukte

Übersicht

Ressourcen nach Marktsegment

Ressourcen nach Produkt

Ressourcen nach Typ

Über unsere Partner

Weltweiter AMD Support

Prozessoren und Grafikprodukte

Beschleuniger

FPGAs und adaptive SoCs

Gaming und Personal Computing

Adaptive und Embedded Computing

Get AMD Fan Gear

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Buy Direct From AMD

Pruning Technique With Different Framework

Article By

Unternehmen

News und Veranstaltungen

Community

Partner

Investoren