Quantize a deep neural network to 8

#Quantize a deep neural network to 8| 来源: 网络整理| 查看: 265

This example uses:

Deep Learning ToolboxDeep Learning ToolboxDeep Learning Toolbox Model Quantization LibraryDeep Learning Toolbox Model Quantization LibraryParallel Computing ToolboxParallel Computing ToolboxOpen Live Script

This example shows how to quantize learnable parameters in the convolution layers of a neural network for GPU and explore the behavior of the quantized network. In this example, you quantize the squeezenet neural network after retraining the network to classify new images according to the Train Deep Learning Network to Classify New Images example. In this example, the memory required for the network is reduced approximately 75% through quantization while the accuracy of the network is not affected.

Load the pretrained network. net is the output network of the Train Deep Learning Network to Classify New Images example.

load squeezenetmerch netnet = DAGNetwork with properties: Layers: [68×1 nnet.cnn.layer.Layer] Connections: [75×2 table] InputNames: {'data'} OutputNames: {'new_classoutput'}

Define calibration and validation data to use for quantization.

The calibration data is used to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

The validation data is used to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.

In this example, use the images in the MerchData data set. Define an augmentedImageDatastore object to resize the data for the network. Then, split the data into calibration and validation data sets.

unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames'); [calData, valData] = splitEachLabel(imds, 0.7, 'randomized'); aug_calData = augmentedImageDatastore([227 227], calData); aug_valData = augmentedImageDatastore([227 227], valData);

Create a dlquantizer object and specify the network to quantize.

quantObj = dlquantizer(net);

Define a metric function to use to compare the behavior of the network before and after quantization. This example uses the hComputeModelAccuracy metric function.

function accuracy = hComputeModelAccuracy(predictionScores, net, dataStore) %% Computes model-level accuracy statistics % Load ground truth tmp = readall(dataStore); groundTruth = tmp.response; % Compare with predicted label with actual ground truth predictionError = {}; for idx=1:numel(groundTruth) [~, idy] = max(predictionScores(idx,:)); yActual = net.Layers(end).Classes(idy); predictionError{end+1} = (yActual == groundTruth(idx)); %#ok end % Sum all prediction errors. predictionError = [predictionError{:}]; accuracy = sum(predictionError)/numel(predictionError); end

Specify the metric function in a dlquantizationOptions object.

quantOpts = dlquantizationOptions('MetricFcn',{@(x)hComputeModelAccuracy(x, net, aug_valData)});

Use the calibrate function to exercise the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.

calResults = calibrate(quantObj, aug_calData)calResults=121×5 table Optimized Layer Name Network Layer Name Learnables / Activations MinValue MaxValue ____________________________ ____________________ ________________________ _________ ________ {'conv1_Weights' } {'conv1' } "Weights" -0.91985 0.88489 {'conv1_Bias' } {'conv1' } "Bias" -0.07925 0.26343 {'fire2-squeeze1x1_Weights'} {'fire2-squeeze1x1'} "Weights" -1.38 1.2477 {'fire2-squeeze1x1_Bias' } {'fire2-squeeze1x1'} "Bias" -0.11641 0.24273 {'fire2-expand1x1_Weights' } {'fire2-expand1x1' } "Weights" -0.7406 0.90982 {'fire2-expand1x1_Bias' } {'fire2-expand1x1' } "Bias" -0.060056 0.14602 {'fire2-expand3x3_Weights' } {'fire2-expand3x3' } "Weights" -0.74397 0.66905 {'fire2-expand3x3_Bias' } {'fire2-expand3x3' } "Bias" -0.051778 0.074239 {'fire3-squeeze1x1_Weights'} {'fire3-squeeze1x1'} "Weights" -0.7712 0.68917 {'fire3-squeeze1x1_Bias' } {'fire3-squeeze1x1'} "Bias" -0.10138 0.32675 {'fire3-expand1x1_Weights' } {'fire3-expand1x1' } "Weights" -0.72035 0.9743 {'fire3-expand1x1_Bias' } {'fire3-expand1x1' } "Bias" -0.067029 0.30425 {'fire3-expand3x3_Weights' } {'fire3-expand3x3' } "Weights" -0.61443 0.7741 {'fire3-expand3x3_Bias' } {'fire3-expand3x3' } "Bias" -0.053613 0.10329 {'fire4-squeeze1x1_Weights'} {'fire4-squeeze1x1'} "Weights" -0.7422 1.0877 {'fire4-squeeze1x1_Bias' } {'fire4-squeeze1x1'} "Bias" -0.10885 0.13881 ⋮

Use the validate function to quantize the learnable parameters in the convolution layers of the network and exercise the network. The function uses the metric function defined in the dlquantizationOptions object to compare the results of the network before and after quantization.

valResults = validate(quantObj, aug_valData, quantOpts)valResults = struct with fields: NumSamples: 20 MetricResults: [1×1 struct] Statistics: [2×2 table]

Examine the validation output to see the performance of the quantized network.

valResults.MetricResults.Resultans=2×2 table NetworkImplementation MetricOutput _____________________ ____________ {'Floating-Point'} 1 {'Quantized' } 1 valResults.Statisticsans=2×2 table NetworkImplementation LearnableParameterMemory(bytes) _____________________ _______________________________ {'Floating-Point'} 2.9003e+06 {'Quantized' } 7.3393e+05

In this example, the memory required for the network was reduced approximately 75% through quantization. The accuracy of the network is not affected.

The weights, biases, and activations of the convolution layers of the network specified in the dlquantizer object now use scaled 8-bit integer data types.

【本文地址】

公司简介

联系我们