Read Environment Set up and Installation

For this tutorial, we’ll be training the models on the Cityscapes dataset. Cityscapes is an automotive dataset created by Daimler which includes various driving scenes, mostly contained in Germany.

The files from Cityscapes provide around 5000 images with fine annotations (class labels) for various city driving scenarios. There are two primary folders from the dataset that we'll be working with:

  • leftImg8bit (includes all of the input images for training)
  • gtFine (includes the class annotations as polygonal format (.json files))

There are also scripts that are separately downloaded for the dataset which are used to transform the class label .json files into class label images (.png files) which are used for training.

The following is an example which shows the different classes after being color coded and alpha-blended with the original image. 



The focus of this database for our purpose is on Semantic Annotations which consist of the following types (we only use the Fine Annotations for this tutorial, though it should also be possible to use coarse annotations and perhaps achieve even better results):

Course Annotations (20000 images)

Fine Annotations (5000 images)

There are 8 groups contained within the Cityscapes dataset with 19 classes. In the following figure, it can be seen that there are 30 classes listed, but all classes with a ‘+’ next to them are treated as a single void class and preparation steps will change their values to ‘255’ which will subsequently be ignored in the training process:

8 groups


More information about the database can be found at the following URL:

Since preparing such a database for training requires a number of steps, the following detailed instructions are provided:

  1. Download the Cityscapes dataset from:

The specific packages needed are the and as shown in the figure below. These files include the 5000 images with fine (pixel-wise) semantic annotations which are divided into train, test, and validation groups. It would also be possible to train using the coarse annotations provided by Cityscapes and perhaps achieve better results, but only training with the fine annotations is covered in this tutorial.

Download files


2. Extract these files into a folder on the Linux workstation. After this you should have a folder containing sub-folders labeled "gtFine" and "leftImg8bit". From the introduction, it was noted that these folders contain the class labels and input images.

3. There are various preparation, inspection, and evaluation scripts provided for Cityscapes which can be cloned from github. The next step will be to download or clone these using the following:

git clone

4. The scripts can be installed by changing directory into the cityscapesScripts folder then using pip:

sudo pip install .

5. Next we need to export the CITYSCAPES_DATASET variable to point to the directory where you extracted the lefimg8bit and gtFine folders. This environment variable will be used by the preparatory scripts which pre-process the annotations into class labels. In order to do this, first change directory to the location where the dataset was extracted, then run the following command. Consider copy/pasting the command as it uses a backtick character surrounding "pwd" and not a single apostrophe.


6. The next step is to create the images which have class labels associated with each pixel and set the unused classes to value '255'. This can be done by running the script. To run this script change directory to the cityscapesScripts/cityscapesscripts directory and run the following:

python preparation/

This will convert annotations in polygonal format (.json files) to .png images with label IDs, where pixels encode the “train IDs”. Since we use the default 19 classes, you do not need to change anything in script at this time. We will later go back and change the for use with evaluating the trained models.

If you are new to datasets, it may be worthwhile to inspect the .json file to see how the polygon data is stored. You'll notice it is basically an array of points that connect the lines for each polygonal area for each class.

After running this script, you will see color coded images which denote the classes as well as trainable images which have the classes encoded in the order determined by the cityscapesscripts/helpers/

An example of the color coded image from the dataset is shown here:

Color coded image

Once the pixels are encoded with trainable values, the different classes are identified as values 0-18 and all of the ignored classes are set to value 255. Notice that it's very difficult to distinguish the various classes in this image as they are such low values (with the exception of the ignored classes). An example of the trainable image is shown here:

Trainable image

Note that it is possible to modify the cityscapesscripts/helpers/ to change the class annotations during the preparatory step.

7. During the training process, a text file is used (the path to this file is specified in the input data layer of the model prototxt files) to identify the location of the training data and annotations. This text file is located under Segment/workspace/data/cityscapes/img_seg.txt. You will need to modify this text file to point to the absolute paths for the input images and associated label images which were just created, which should exist in the subdirectories where the Cityscapes data was extracted

I use VsCode which is a nice text editor that allows for the use of alt+shift to select columns of text and replace them. You can also select a section of text and right click to replace all occurrences of that text.

The left column in the img_seg.txt should point to the input image (these are stored under the Cityscapes/leftImg8bit directory), the right column should point to the labelTrainIds.png (which are the annotations or ground truths and are stored under the Cityscapes/gtFine directory).

There are many classes that get ignored and their pixel values are set to '255'. You can note that in the provided model prototxt files, the final softmax and accuracy layers in the network have set a label ignore parameter for value 255 to ignore these classes. All of the other classes need to start at class 0 and increment. The prototxt files referred to from here exist in Segment/workspace/model which includes folders for each of the models that are covered in the tutorial.

At this point, the training dataset has been prepared and is ready for use to train the models and you can proceed to the next step which is 3.0 Training Models.

Read 3.0 Trainig Models

About Jon Cory

About Jon Cory

Jon Cory is located near Detroit, Michigan and serves as an Automotive focused Machine Learning Specialist Field Applications Engineer (FAE) for Xilinx.  Jon’s key roles include introducing Xilinx ML solutions, training customers on the ML tool flow, and assisting with deployment and optimization of ML algorithms in Xilinx devices.  Previously, Jon spent two years as an Embedded Vision Specialist (FAE) with a focus on handcrafted feature algorithms, Vivado HLS, and ML algorithms, and six years prior as a Xilinx Generalist FAE in Michigan/Northern Ohio.  Jon is happily married for four years to Monica Cory and enjoys a wide variety of interests including music, running, traveling, and losing at online video games.