Accelerating ML Preprocessing with the Vitis Vision Library

Alvin Clark

Published: Oct 02, 2019

AI - Machine Learning
Video and Image Processing
Demo and Example Designs
Alveo U200
Alveo U250
Alveo U280
Ultra96
ZCU102
ZCU104
ZCU111
Vitis

Machine Learning Preprocessing

In machine learning, data preprocessing is an integral step required to convert input data into a clean data set. A machine learning application receives data from multiple sources using multiple formats; this data needs to be transformed to format feasible for analysis before being passed to the model.

The Xilinx ® Vitis™ Vision accelerated library can help users preprocess image data before being fed to different deep neural networks (DNNs).

This article covers the necessary preprocessing for two different neural networks: GoogleNet and Yolov2.

Typical preprocessing functions

Two of the most typical preprocessing functions for image data are color space conversion (CSC) and resizing.

CSC converts the incoming images from one color space to another. One example would be converting the YUYV output from a webcam to the RGB format expected by the neural network.

Resizing the image changes the resolution of the input image to match the resolution used to train the network.

Additional preprocessing computations are subtraction, scaling, and threshold calculation.

CSC and resizing are typically done using standard image processing libraries such as OpenCV. However, additional mathematical operations are implemented using discrete instructions.

The Vitis Vision library drop-in accelerated versions of the standard CSC and resize OpenCV functions are used in conjunction with a custom mathematical kernel to accelerate the full preprocessing pipeline.

Accelerators

Vitis Vision library provides performance optimized functions that are drop-in replacements for standard OpenCV library functions.

CSC:

OpenCV

    void cvtColor(InputArray src, OutputArray dst, int code, int dstCn=0 )

Vitis Vision library (YUYV to RGBA)

    void yuyv2rgba(xf::Mat<SRC_T, ROWS, COLS, NPC> & _src, xf::Mat<DST_T, ROWS,COLS, NPC> & _dst)

Resize:

OpenCV:

    void resize(InputArray src, OutputArray dst, Size dsize, double fx=0, double fy=0, int interpolation=INTER_LINEAR )

Vitis Vision library

    void resize (xf::Mat<TYPE, SRC_ROWS, SRC_COLS, NPC> & _src, xf::Mat<TYPE,DST_ROWS, DST_COLS, NPC> & _dst)

Custom Accelerator:

A custom accelerated function is created to do additional processing. It is implemented in manner that supports multiple operations.

API:

    template <int INPUT_PTR_WIDTH, int OUTPUT_PTR_WIDTH,  int T_CHANNELS , int CPW,  int ROWS, int COLS, int NPC, bool PACK_MODE, int X_WIDTH, int ALPHA_WIDTH, int BETA_WIDTH, int GAMMA_WIDTH, int OUT_WIDTH, int X_IBITS, int ALPHA_IBITS, int BETA_IBITS, int GAMMA_IBITS,int OUT_IBITS, bool SIGNED_IN, int OPMODE, int UTH, int LTH>
void preProcess(ap_uint<INPUT_PTR_WIDTH> *inp, ap_uint<OUTPUT_PTR_WIDTH> *out,  float params[3*T_CHANNELS], int rows, int cols)

Data Flow:

Latency:

Number of cycles = (rows*cols*T_CHANNELS)/(CPW*NPC)

Preprocessing Pipelines

The figure below shows the preprocessing pipelines for GoogleNet and YoloV2

All the functions in the pipeline have streaming interfaces, operate in dataflow, and use fixed point arithmetic.

Both pipelines were defined using C++

    void preprocessing ()
{
...
        xf::cv::Array2xfMat<INPUT_PTR_WIDTH,XF_8UC3,HEIGHT, WIDTH, NPC1>  (img_inp, imgInput0);
        xf::cv::resize<INTERPOLATION,TYPE,HEIGHT,WIDTH,NEWHEIGHT,NEWWIDTH,NPC_T,MAXDOWNSCALE> (imgInput0, out_mat);
        xf::cv::accel_utils obj;
        obj.xfMat2hlsStrm<INPUT_PTR_WIDTH, TYPE, NEWHEIGHT, NEWWIDTH, NPC_T, (NEWWIDTH*NEWHEIGHT/8)>(out_mat, resizeStrmout, srcMat_cols_align_npc);
        xf::cv::preProcess <INPUT_PTR_WIDTH, OUTPUT_PTR_WIDTH, T_CHANNELS, CPW, HEIGHT, WIDTH, NPC_TEST, PACK_MODE, X_WIDTH, ALPHA_WIDTH, BETA_WIDTH, GAMMA_WIDTH, OUT_WIDTH, X_IBITS, ALPHA_IBITS, BETA_IBITS, GAMMA_IBITS, OUT_IBITS, SIGNED_IN, OPMODE> (resizeStrmout, img_out, params, rows_out, cols_out, th1, th2);

Results

Intel(R) Xeon(R)

Silver 4100 CPU @ 2.10GHz, 8 core

Intel(R) Core(TM)

i7-4770 CPU @ 3.40GHz, 4 core

FPGA

(Alveo-U200)

Speedup

(Xeon/i7)

Googlenet_v1

5.63 ms

59.9 ms

1.1 ms

5x/54x

YOLO

15.35 ms

68.34 ms

1.1 ms

14x/62x

Accuracy:(Googlenet_v1)

SW Pre-processing: Top -1 accuracy - 66 % Top-5 accuracy – 87.5 %

HW Pre-processing: Top -1 accuracy - 64.8 % Top-5 accuracy – 87.5 %

Resource Utilization:

Design Files

The source files for the GoogleNet preprocessing example can be download from the Vitis™ accelerated libraries repository on GitHub: https://github.com/Xilinx/Vitis_Libraries/tree/master/vision/L3/benchmarks/blobfromimage

About Alvin Clark

Alvin Clark is a Sr. Technical Marketing Engineer working on Software and AI Platforms at AMD, helping usher in the ACAP era of programmable devices. Alvin has spent his career working and supporting countless FPGA and embedded designs in many varied fields ranging from Consumer to Medical to Aerospace & Defense. He has a degree from the University of California, San Diego and is working on his graduate degree at the Georgia Institute of Technology.

Servers

Business Systems

Workstations

Embedded

Personal Laptops

Personal Desktops

Handheld

Resources

GPU Accelerators

Adaptive Accelerators

DPU Accelerators

SmartNICs & Ethernet Adapters

Workstations

Desktops

Laptops

Resources

Adaptive SoCs & FPGAs

System-on-Modules (SOMs)

Technologies

Resources

Evaluation Boards & Kits

Processor Tools

Graphics Tools & Apps

Adaptive SoC & FPGA Tools

Intellectual Property & Apps

GPU Accelerator Tools & Apps

DPU Accelerator Tools

Overview

For Data Center & Cloud

For Edge & Endpoints

For Developers

Industries

Industries

Industries

Industries

Workloads

Deployments

Network, Infrastructure, & Storage

Resources

Gaming

Technologies

Systems

EPYC Processors

Radeon Graphics & AMD Chipsets

Adaptive SoCs & FPGAs

Alveo Accelerators & Kria SOMs

Ryzen Processors

Ethernet Adapters

Overview

Processors

Accelerators, SOMs & NICs

Adaptive SoCs & FPGAs

Graphics

Overview

Product Information & Training

Product Specifications

Resources

Processors & Graphics

DPU Accelerators

Adaptive SoCs & FPGAs

Gaming & Personal Computing

Adaptive & Embedded Computing

Get AMD Fan Gear

Shop Our Retail Partners

Accelerating ML Preprocessing with the Vitis Vision Library

Machine Learning Preprocessing

Typical preprocessing functions

Accelerators

Preprocessing Pipelines

Design Files

About Alvin Clark

Company

News & Events

Community

Partners

Investors