Machine Learning Preprocessing

In machine learning, data preprocessing is an integral step required to convert input data into a clean data set. A machine learning application receives data from multiple sources using multiple formats; this data needs to be transformed to format feasible for analysis before being passed to the model.

The Xilinx ® Vitis™ Vision accelerated library can help users preprocess image data before being fed to different deep neural networks (DNNs).

This article covers the necessary preprocessing for two different neural networks: GoogleNet and Yolov2.


Typical preprocessing functions

Two of the most typical preprocessing functions for image data are color space conversion (CSC) and resizing.

CSC converts the incoming images from one color space to another. One example would be converting the YUYV output from a webcam to the RGB format expected by the neural network.

Resizing the image changes the resolution of the input image to match the resolution used to train the network.

Additional preprocessing computations are subtraction, scaling, and threshold calculation.

CSC and resizing are typically done using standard image processing libraries such as OpenCV. However, additional mathematical operations are implemented using discrete instructions.

The Vitis Vision library drop-in accelerated versions of the standard CSC and resize OpenCV functions are used in conjunction with a custom mathematical kernel to accelerate the full preprocessing pipeline.


Accelerators

Vitis Vision library provides performance optimized functions that are drop-in replacements for standard OpenCV library functions.

CSC:

    OpenCV

    void cvtColor(InputArray src, OutputArray dst, int code, int dstCn=0 )

    Vitis Vision library (YUYV to RGBA)

    void yuyv2rgba(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,COLS, NPC> & _dst)

Resize:

    OpenCV:

    void resize(InputArray src, OutputArray dst, Size dsize, double fx=0, double fy=0, int interpolation=INTER_LINEAR )

    Vitis Vision library

    void resize (xf::Mat<TYPE, SRC_ROWS, SRC_COLS, NPC> & _src, xf::Mat<TYPE,DST_ROWS, DST_COLS, NPC> & _dst)

Custom Accelerator:

A custom accelerated function is created to do additional processing. It is implemented in manner that supports multiple operations.

Operating Modes

API:

    template <int INPUT_PTR_WIDTH, int OUTPUT_PTR_WIDTH,  int T_CHANNELS , int CPW,  int ROWS, int COLS, int NPC, bool PACK_MODE, int X_WIDTH, int ALPHA_WIDTH, int BETA_WIDTH, int GAMMA_WIDTH, int OUT_WIDTH, int X_IBITS, int ALPHA_IBITS, int BETA_IBITS, int GAMMA_IBITS,int OUT_IBITS, bool SIGNED_IN, int OPMODE, int UTH, int LTH>
void preProcess(ap_uint<INPUT_PTR_WIDTH> *inp, ap_uint<OUTPUT_PTR_WIDTH> *out,  float params[3*T_CHANNELS], int rows, int cols)

Data Flow:

Data Flow

Latency:

Number of cycles =   (rows*cols*T_CHANNELS)/(CPW*NPC)

 


Preprocessing Pipelines

The figure below shows the preprocessing pipelines for GoogleNet and YoloV2

Yolo_Googlenet

All the functions in the pipeline have streaming interfaces, operate in dataflow, and use fixed point arithmetic.

Both pipelines were defined using C++

    void preprocessing ()
{
...
        xf::cv::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC3,HEIGHT, WIDTH, NPC1>  (img_inp, imgInput0);
        xf::cv::resize<INTERPOLATION,TYPE,HEIGHT,WIDTH,NEWHEIGHT,NEWWIDTH,NPC_T,MAXDOWNSCALE> (imgInput0, out_mat);
        xf::cv::accel_utils obj;
        obj.xfMat2hlsStrm<INPUT_PTR_WIDTH, TYPE, NEWHEIGHT, NEWWIDTH, NPC_T, (NEWWIDTH*NEWHEIGHT/8)>(out_mat, resizeStrmout, srcMat_cols_align_npc);
        xf::cv::preProcess <INPUT_PTR_WIDTH, OUTPUT_PTR_WIDTH, T_CHANNELS, CPW, HEIGHT, WIDTH, NPC_TEST, PACK_MODE, X_WIDTH, ALPHA_WIDTH, BETA_WIDTH, GAMMA_WIDTH, OUT_WIDTH, X_IBITS, ALPHA_IBITS, BETA_IBITS, GAMMA_IBITS, OUT_IBITS, SIGNED_IN, OPMODE> (resizeStrmout, img_out, params, rows_out, cols_out, th1, th2);

Results

 

Intel(R) Xeon(R)

Silver 4100 CPU @ 2.10GHz, 8 core

Intel(R) Core(TM)

i7-4770 CPU @ 3.40GHz, 4 core

FPGA

(Alveo-U200)

Speedup

(Xeon/i7)

Googlenet_v1 5.63 ms 59.9 ms 1.1 ms 5x/54x
YOLO 15.35 ms 68.34 ms  1.1 ms 14x/62x

Accuracy:(Googlenet_v1)

SW Pre-processing:  Top -1 accuracy -   66     %   Top-5 accuracy – 87.5 %

HW Pre-processing:  Top -1 accuracy -  64.8  %   Top-5 accuracy – 87.5 %

Resource Utilization:

resource utilization

Design Files

The source files for the GoogleNet preprocessing example can be download from the Vitis™ accelerated libraries  repository on GitHub:  https://github.com/Xilinx/Vitis_Libraries/tree/master/vision/L3/benchmarks/blobfromimage 


About Alvin Clark

About Alvin Clark

Alvin Clark is a Sr. Technical Marketing Engineer working on Software and AI Platforms at AMD, helping usher in the ACAP era of programmable devices.  Alvin has spent his career working and supporting countless FPGA and embedded designs in many varied fields ranging from Consumer to Medical to Aerospace & Defense.  He has a degree from the University of California, San Diego and is working on his graduate degree at the Georgia Institute of Technology.