ML Caffe Segmentation Tutorial: 4.0 Quantizing and Compiling the Segmentation networks for DPU implementation

Jon Cory

Published: Oct 21, 2019

AI - Machine Learning
Introductory Tutorials
ZCU102
Vitis AI
AI - Machine Learning

4.0 PART 1: Installing the DNNDK tools

In section 1, we downloaded the DNNDK tools and copied them over to the ZCU102 board. A portion of these tools also needs to be installed on the host x86 machine for quantizing and compiling the model.

The tools needed are contained under the host_x86 tools directory.

Please copy the host_x86 folder and its subdirectories to the host machine.
Next cd into the host_x86 directory and install the host tools.sudo ./install.sh ZCU102

NOTE: The target for this tutorial is the ZCU102, but it should be possible to target other boards as well by changing the target shown above when installing the tools and also modifying the dnnc command to target the correct DPU. As a quick reference, the ZCU102 and ZCU104 use a 4096FA DPU.

Please refer to the DNNDK User Guide for more details on the DNNDK tools.

If you would like to quantize and deploy the model, continue onto 4.0 part 2, otherwise if you would like to first test the quantized and floating point models and compare the mIOU between the two, then jump down to 4.0 part 3.

4.0 PART 2: Configuring the Files for Quantization, Compilation, and mIOU Testing:

I have included an example workspace in Segment/DNNDK to show how the DNNDK tools may be invoked as well as the necessary modifications to the prototxt files for both quantization/compilation and testing the float and quantized model mIOUs. Change directory to the DNNDK directory before proceeding to the next step.
Within the DNNDK directory, there is a subdirectory for each model. Inside each model directory several files:

"float.prototxt" is used for quantizing/compiling the models for deployment on the target hardware
"float_test.prototxt" is used for testing the float and quantized models to report the mIOU against the cityscapes validation dataset
"float.caffemodel" is the pre-trained caffemodel.
"quantize_and_compile.sh" is a script that is used to perform both quantization and compilation (decent_q and dnnc) for deployment on the target hardware
"test_float_and_quantized.sh" is a script that will test both the floating point and quantized models and report out the mIOU for each
There is also a subdirectory for decent as a local copy of decent_q is provided rather than the publicly distributed decent_q, as this local copy provides the capability to test both the floating point and quantized models.

Open the "float.prototxt" that is included as an example in the DNNDK subfolders (i.e. ENet, ESPNet, etc.).

The "float.prototxt" files should be mostly identical to your "train_val.prototxt" except for the following:

The input layer has changed from "ImageSegData" type to "ImageData"
Paths have been specified to the calibration data in a relative fashion so that they point to the correct locations if the directory structure is left intact.
Note by default that the prototxt files are set to generate a 512x256 input size model which is intended for use with the xxx_video applications (e.g. fpn_video). If you wish to run the evaluation in hardware on cityscapes validation images rather than on the recorded video (e.g. fpn_eval), the applications use 1024x512, so you will need to modify these input layers accordingly (the float_test.prototxt files have the input set for 1024x512 if you wish to use this as an example).

    
line 11:  source: "../data/cityscapes/calibration.txt"
line 12:  root_folder: "../data/cityscapes/calibration_images/"

The "SoftmaxWithLoss" layer has been changed to "SoftMax" and the "Accuracy" layer has been removed. These layers were previously used to compute loss and accuracy for the training phase, so they have now been updated for deployment.

Important note for ENet float.prototxt: the "UpsamplingBilinear2d_x" layers have been changed to "DeephiResize" type because decent doesn't support bilinear upsampling with the deconvolution layer

You can use these prototxt files directly if the differences mentioned above are the only deltas between your train_val.prototxt file and float.prototxt. Otherwise, if you are deploying the encoder model only or a modified version, you will need to update your train_val.prototxt to accommodate for the differences mentioned above, rename that file to "float.prototxt", and place it in the correct directory.

The calibration data needs to be populated into the Segment/DNNDK/data/cityscapes/calibration_images directory. This data consists of a list of images which are specified in the calibration.txt file, and 1000 test images from Cityscapes. These will be used by the decent quantize process as stimulus for calibration of the model dynamic range during the quantization process.

The data listed in the calibration.txt file calls out the following 1000 images:

the first 100 images from CITYSCAPES_DATASET/leftImg8bit/test/berlin
all images from $CITYSCAPES_DATASET/leftImg8bit/test/bielefeld
all images from $CITYSCAPES_DATASET/leftImg8bit/test/bonn
all images from $CITYSCAPES_DATASET/leftImg8bit/test/mainz
the first 373 images from $CITYSCAPES_DATASET/leftImg8bit/test/munich

You will need to copy these images or potentially create soft links from the dataset directories listed about to the Segment/DNNDK/data/cityscapes/calibration_images directory. You can use other calibration images if desired, however, the provided calibration.txt file uses the images listed above.

Next copy your latest trained model from Caffe into the Segment/DNNDK/model_subdirectory_name directory (or reuse the already populated float.caffemodel) and rename it "float.caffemodel". This model should be located wherever the snapshot was saved from the the training step.
Next run the quantization tools using the following command:

    
./quantize_and_compile.sh

If you open the script, you will see the following contents which indicate several things - first of all, you should make sure the GPUID environment variable is set correctly for your machine. If you have only one GPU, this should be '0', otherwise, please change this to the index for the desired GPU to use for quantization.

Secondarily, there is a Segment/DNNDK/decent/setup_decent_q.sh script being called which checks your nVidia environment and uses the correct local decent_q executable for quantization. The reason for this is that at the time this tutorial was authored, the public version of decent did not yet have the capability to perform testing on the floating point models, so this version of decent_q has been provided with this tutorial to enable mIOU testing for both the floating point and quantized models.

Next, you can see that decent_q_segment quantize is called with various arguments including calibration iterations, GPUID, paths to the input and output models, and a tee to dump the output to a text file in the decent_output directory.

For reference, I have included an enet decent log file and espent decent log file that shows the output of my console after running the decent command. You should see something similar after running the command on your machine.

Finally, the dnnc command is called which compiles the floating point model and produces a file called "dpu_segmentation_0.elf" under the dnnc_output directory.

For reference, I have included an enet dnnc log file and espent dnnc log file that shows the output of my console after the dnnc command is run. You should see something similar after running the command on your machine.

    
#!/usr/bin/env bash
export GPUID=0
net=segmentation
source ../decent/setup_decent_q.sh

#working directory
work_dir=$(pwd)
#path of float model
model_dir=decent_output
#output directory
output_dir=dnnc_output

echo "quantizing network: $(pwd)/float.prototxt"
./../decent/decent_q_segment quantize            \
          -model $(pwd)/float.prototxt     \
          -weights $(pwd)/float.caffemodel \
          -gpu $GPUID \
          -calib_iter 1000 \
          -output_dir ${model_dir} 2>&1 | tee ${model_dir}/decent_log.txt

echo "Compiling network: ${net}"

dnnc    --prototxt=${model_dir}/deploy.prototxt \
        --caffemodel=${model_dir}/deploy.caffemodel \
        --output_dir=${output_dir} \
        --net_name=${net} --dpu=4096FA \
        --cpu_arch=arm64 2>&1 | tee ${output_dir}/dnnc_log.txt

At this point, an elf file should have been created in the dnnc_output directory which can be used in the final step which is to run the models on the ZCU102. If desired, you can also proceed to the Part 3 of 4.0 which is testing the floating point and quantized models.

4.0 PART 3: Testing the Floating Point and Quantized Models

As mentioned in the previous section, files have been provided under the Segment/DNNDK/model_subdirectory_name filepath which can enable you to rapidly test the mIOU of both the floating point model as well as the quantized model on the cityscapes validation dataset. In order to perform this testing, perform the following steps:

Open the Segment/DNNDK/data/val_img_seg_nomap.txt file with a text editor.
Notice that this file contains paths to the cityscapes validation dataset as they are stored on my local machine. The left column has a path to the input image, and the right column has a path to the labels. You need to modify the root directory portion of both paths to point to the location of the cityscapes dataset on your machine.
Open the float_test.prototxt file that corresponds to the model of interest. Notice that there are several differences between this file and the float.prototxt that was used for deployment. The reason for this is that the DeephiResize layer causes some problems in the current version of decent which will prevent dnnc from compiling the model (it causes the input layer to be renamed to "resize_down" which causes dnnc to fail- for this reason two separate files are used, one for testing and one for deployment).

The new additions to this model are to support the auto_test and test decent_q commands:

The input size of the model has been changed from 512x256 to 1024x512. This is because the larger input size produces better mIOU results. It would be possible to use other sizes such as the native input size for the citysacpes dataset which is 2048x1024, but testing the models would take longer and the Unet-full model will not work in this case because of some limitations on the Caffe distribution used within the decent_q tool. Additionally, the models were trained with an input crop size of 512, so it is not necessarily expected that using the larger size will produce better results.
An additional input layer "ImageSegData" has been added which has a path to the val_img_seg_nomap.txt file. This is how the labels and input images are supplied for the testing procedure.
A layer after this called "resize_down" has been added to scale the input image to the desired input size for the model (in this case 1024x512).

A new layer at the end of the model has been added called "SegmentPixelIOU" which is a custom caffe layer packed up within the decent_q tool. If you noticed, the val_img_seg_nomap.txt file actually points to the gtFIne_labelIds rather than gtFine_labelTrainIds. This is because the SegmentPixelIOU layer has been coded to automatically relabel the classes from the cityscapes labels such that the classes match gtFine_labelTrainIds and values 255 are ignored.

4. Open the one of the test_float_and_quantized.sh scripts. The contents of this script are shown below. You will only need to edit the GPUID to specify the correct GPU index for your tests. Note that the log files will be captured under the test_results subdirectory for both the floating point and quantized results.

    
export GPUID=0
export WKDIR=`pwd`
cd ../decent
source setup_decent_q.sh
cd $WKDIR
./../decent/decent_q_segment test -model float_test.prototxt -weights float.caffemodel -test_iter 500 -gpu $GPUID 2>&1 | tee test_results/float_model_test.txt

#working directory
work_dir=$(pwd)
#path of float model
model_dir=${work_dir}
#output directory
output_dir=${work_dir}/decent_output

./../decent/decent_q_segment quantize             \
          -model ${model_dir}/float_test.prototxt \
          -weights ${model_dir}/float.caffemodel  \
          -gpu $GPUID \
          -calib_iter 1000 \
          -test_iter 500 \
          -auto_test \
          -output_dir ${output_dir} 2>&1 | tee test_results/quantized_model_test.txt

5. Execute the Script by running the following command. This will take some time which varies depending on the GPU hardware available as well as which model is being run. I have included example test results from a previous run under the associated model directories such as Segment/DNNDK/FPN/test_results. Note that the previous run results I have included does not necessarily represent the best performing snapshot - it is just an example of the output of running the test script.

./test_float_and_quantized.sh

At this point, the quantized and floating point models have been fully verified on the host and you are ready to proceed to deploying the models to the target hardware, however, if you skipped the section on pre-trained models you may be wondering how they scored. Jump back up 3.1.0 About the Pre-Trained Models to see the results.

Read 5.0 Evaluating the Floating Point Models on the Host PC

About Jon Cory

Jon Cory is located near Detroit, Michigan and serves as an Automotive focused Machine Learning Specialist Field Applications Engineer (FAE) for AMD. Jon’s key roles include introducing AMD ML solutions, training customers on the ML tool flow, and assisting with deployment and optimization of ML algorithms in AMD devices. Previously, Jon spent two years as an Embedded Vision Specialist (FAE) with a focus on handcrafted feature algorithms, Vivado HLS, and ML algorithms, and six years prior as an AMD Generalist FAE in Michigan/Northern Ohio. Jon is happily married for four years to Monica Cory and enjoys a wide variety of interests including music, running, traveling, and losing at online video games.

Servers

Business Systems

Workstations

Embedded

Personal Laptops

Personal Desktops

Handheld

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

SmartNICs & Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

DPU Accelerator Tools

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Workloads

Deployments

Network, Infrastructure, & Storage

Resources

Gaming

Technologies

Systems

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators, SOMs & NICs

Adaptive SoCs & FPGAs

Graphics

Overview

Product Information & Training

Product Specifications

Resources

Processors & Graphics

DPU Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

ML Caffe Segmentation Tutorial: 4.0 Quantizing and Compiling the Segmentation networks for DPU implementation

4.0 PART 1: Installing the DNNDK tools

4.0 PART 2: Configuring the Files for Quantization, Compilation, and mIOU Testing:

4.0 PART 3: Testing the Floating Point and Quantized Models

About Jon Cory

Company

News & Events

Community

Partners

Investors