The following is a tutorial on how to train, quantize, compile, and deploy various segmentation networks including ENet, ESPNet, FPN, UNet, and a reduced compute version of UNet that we'll call Unet-lite. The training dataset used for this tutorial is the Cityscapes dataset, and the Caffe framework is used for training the models. After training, the DNNDK tools are used to quantize and compile the models, and ARM C++ application examples are included for deploying the models on a Xilinx Zynq® UltraScale+™ ZCU102 target board. For background information on ESPNet, ENet, and general segmentation approaches, the Segmentation Introduction Presentation has been provided. 

Note that the goal of this tutorial is not to provide optimized high accuracy models, but rather to provide a framework and guidance under which segmentation models can be trained and deployed on Xilinx MPSoCs.

The tutorial is organized as follows:

  1. Environment Setup and Installation
  2. Prepare the Cityscapes database for Training Segmentation Models
  3. Training the Models
  4. Evaluating the Floating Point Models on the Host PC
  5. Quantizing and Compiling the Segmentation networks for DPU implementation
  6. Running the Models on the ZCU102
  7. Post-processing the Hardware Inference Output

Pre-Install Considerations for Caffe for Segmentation Models

Segmentation Networks require a special fork of Caffe. The Caffe distribution needed for these networks is included with this example in the Segment/caffe-master directory. Instructions to install and configure the ZCU102 image and install the provided Caffe distribution directly on the host are available via step "1.0 Environment Setup and Installation".

Whatever solution you have used to setup the environment, from now on I assume that you have the provided Caffe fork installed (either directly on your machine, in a python virtual environment, or within the Docker image) on your Ubuntu 16.04 Linux file system, and you have set the $CAFFE_ROOT environment variable with the following command into your ~/.bashrc file (modify the file path for the location where the Segment directory has been placed): 

export CAFFE_ROOT=/absolute_path_to_caffe_install/Segment/caffe-master

If you have successfully installed the provided fork of Caffe, the DNNDK tools, and configured your ZCU102 board with the Debian stretch image then proceed to "2.0 Prepare the Cityscapes database for Training Segmentation Models", otherwise, proceed to step "1.0 Environment Setup and Installation".

This text from this tutorial has also been converted to a .pdf format and can be found in PDF/tutorial.pdf. Note that the .pdf is not maintained, and only the .md files are maintained, so please use the native it format for the latest.

1.0 Environment Setup and Installation

An Ubuntu 16.04 or 14.04 host machine is needed with an Nvidia GPU card as well as the proper CUDA, CuDNN, and NCCL libraries (these dependencies are documented in the user guide which is included in the subsequent content)

This tutorial was tested with a ZCU102 revision 1.0 (newer should also be ok), Displayport monitor, keyboard, mouse, and USB hub with USB micro converter.

The DNNDK release used for testing this tutorial is xlnx_dnndk_v3.0_190624.tar.gz, MD5=51d0b0fe95493a2e0bf9c19116b17bb8.

Site for downloading DPU/DNNDK images:

(This image needs to be written to an SD card using Win32DiskImager or equivalent)

DNNDK user guide Tools for ZCU102 and Host x86:

1.1 PART 1: Deephi DNNDK and ZCU102 Image Setup:

The Deephi tools and image can be set up via the following steps:

1.  Download the demo image for the desired board. The images available on the AI Developer Hub support the Ultra96, ZCU104, and ZCU102. This tutorial should work with any of the boards provided that dnnc is set to target the correct DPU within that board image. For the Ultra96, the DPU target called in the dnnc command line should be 2304FA, and the ZCU102 and ZCU104 should be the 4096FA. 


Note that all networks have been tested with DNNDK v3.0 and the DPU v1.4.0 and have been found to work properly with this configuration.

2.  Extract the iso image from the zip file

3.  If using Windows, install Win32DiskImager and flash the image to a 16GB or larger SD card (this tutorial was verified using a SanDisk Ultra 80MB/s 32GB SD card)

4.  Boot the ZCU102 from the SD card with a USB keyboard and monitor connected. If unsure how to configure the ZCU102 jumpers and switches, please refer to the reVISION Getting Started Guide Operating Instructions

4.a) If for some reason no display is shown on the monitor, it is recommended to try executing the following commands via the serial terminal: 

export DISPLAY=:0.0
xrandr --output DP-1 --mode 1024x768
xset -dpms

5.  If using Windows, WinSCP, PSCP, or MobaXTerm can be used to transfer the files between the ZCU102 and the host PC. If using PSCP, you can simply copy PSCP from ref_files/ to the desired directory, cd to that directory with the Windows command prompt and use it. If using WinSCP or MobaXTerm, you can connect to the board with the GUI and transfer the files by dragging and dropping them.

6.  Next connect an Ethernet cable from the ZCU102 board to your host machine and get/set the Ethernet IP address of the board using ifconfig in a terminal (I set mine to using  ifconfig eth0

7.  Set the IP of your host machine to static on the same subnet such as If there is an anti-virus firewall on your host machine, you may need to disable it before proceeding to the next step.

8.  The DNNDK tools and sample images need to be copied to the board. This can be done with pscp using a command like the following (password is root): pscp –r c:\pathtodnndkfiles root@ or by dragging and dropping with WinSCP or or MobaXTerm.

9.  Copy over the ZCU102 folder and the common folder included in the DNNDK tools and sample images downloaded from step 9 from the host machine to the ZCU102 board

10.  Install the DNNDK tools on the ZCU102 by running the provided script (./ZCU102/

11.  The x86 host tools will need to be copied to a linux x86 host machine to quantize/compile the model. This step will be covered in the subsequent steps.

11.a) NOTE: To change the display resolution in the Ubuntu linux, execute the following:

             xrandr – q to list the supported modes

             xrandr –s 1920x1080 to set the mode  


1.1 PART 2: Installing the Host Dependencies

The following guidelines are provided for direct setup on a host x86 Ubuntu machine with GPU:

  1. Install the dependencies: 
sudo apt-get install -y libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev python-numpy python-scipy python-matplotlib python-sklearn python-skimage python-h5py python-protobuf python-leveldb python-networkx python-nose python-pandas python-gflags ipython protobuf-c-compiler protobuf-compiler libboost-regex-dev libyaml-cpp-dev g++ git make build-essential autoconf libtool libopenblas-dev libgflags-dev libgoogle-glog-dev liblmdb-dev libhdf5-dev libboost-system-dev libboost-thread-dev libboost-filesystem-dev python-opencv libyaml-dev


   2.  Install the NVidia libraries/drivers -This tutorial was verified with the following configuration:

NVidia GTX 1080ti Graphics Card

NVidia 390 graphics drivers

2.a) CUDA v8.0 (follow Nvidia instructions and install from the runfile along with the patch)

2.b) CuDNN v7.0.5 using the "cuDNN v7.0.5 Library for Linux" selection for CUDA 8.0 The following steps were used to install CuDNN:

sudo tar -xzvf cudnn-9.1-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo ln –s cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

2.c) NCCL v1.2.3 - The following steps were used to install:

Download NCCL 1.2.3 (linked with CUDA 8.0) source code (tar.gz) 

tar –xvf nccl-1.2.3-1-cuda8.0.tar.gz
cd nccl-1.2.3-1-cuda8.0
sudo make install -j
sudo ldconfig /usr/local/cuda/lib64

2.d) Next create symbolic links and an environment variable for hdf5:  

cd /usr/lib/x86_64-linux-gnu
sudo ln –s
sudo ln –s
export CPATH=“/usr/include/hdf5/serial”

This last line can also be added to ~/.bashrc to configure it on startup

2.e) reboot your machine 


NOTE: If issues are encountered where the graphics driver needs to be installed, please use the following instructions to install: 

First remove other installations via the following: 

sudo apt-get purge nvidia-cuda*
sudo apt-get purge nvidia-*

  1. Enter a terminal session using ctrl+alt+F2
  2. Stop lightdm: sudo service lightdm stop
  3. Create a file at /etc/modprobe.d/blacklist-nouveau.conf with the following contents:
  4. blacklist nouveau
  5. options nouveau modeset=0
  6. Then do: sudo update-initramfs –u
  7. Add the graphics driver PPA: 
sudo add-apt-repository ppa:graphics-drivers
sudo apt-get update

Now install and activate the latest drivers (for this tutorial, version 390): 

sudo apt-get install nvidia-390

1.1 PART 3: Installing Caffe Fork for Segmentation

  1. Copy the provided distribution of Caffe located at in the Segment folder to your host Ubuntu machine. Make sure to copy the entire Segment folder, not just the "caffe-master" subfolder. The reason is that all of these folders will be needed later in the tutorial. Specifically, the "workspace" directory contains the prototxt descriptions needed for training the various models as well as scripts for evaluating the models, and the "DNNDK" folder contains the workspace for quantizing/compiling the models for execution on the ZCU102, while the "caffe-master" folder contains the caffe fork needed to train these networks.

  2. Note that the Makefile.Config (as well as the Makefile.Config.example) has already been modified to enable the use of CuDNN and has set the CUDA_DIR to /usr/local/cuda. I have already modified this file, so you should not need to do anything to it to build the caffe distribution. Note that the LIBRARY_DIRS and INCLUDE_DIRS variables have also been set in this file to point to the following locations for hdf5: 

INCLUDE_DIRS:=$(PYTHON_INCLUDE) /usr/local/include /usr/include/hdf5/serial
LIBRARY_DIRS:=$(PYTHON_LIB) /usr/local/lib /usr/lib /usr/lib/x86_64-linux-gnu /usr/lib/x86_64-linux-gnu/hdf5/serial

3. At this point, you should be able to cd to your caffe-master directory (this is the directory where the makefile exists – hereafter referred to as $CAFFE_ROOT) and run the following:

make clean
make –j8
make py

4. Next run the following command to export the PYTHONPATH variable (note that if you change terminal windows, you will need to re-export this variable): 


At this point, Caffe is now installed and you may proceed to the next section on dataset preparation.

About Jon Cory

About Jon Cory

Jon Cory is located near Detroit, Michigan and serves as an Automotive focused Machine Learning Specialist Field Applications Engineer (FAE) for Xilinx.  Jon’s key roles include introducing Xilinx ML solutions, training customers on the ML tool flow, and assisting with deployment and optimization of ML algorithms in Xilinx devices.  Previously, Jon spent two years as an Embedded Vision Specialist (FAE) with a focus on handcrafted feature algorithms, Vivado HLS, and ML algorithms, and six years prior as a Xilinx Generalist FAE in Michigan/Northern Ohio.  Jon is happily married for four years to Monica Cory and enjoys a wide variety of interests including music, running, traveling, and losing at online video games.