This report discusses the steps required to accelerate a SVM (Support Vector Machine) written in ‘C’ code on a Xilinx® Zynq® UltraScale+™FPGA using the Vitis™ unified software platform as an embedded application.
A Support Vector Machine is a versatile classification algorithm that ‘Kernelizes’ linear regression to model non-linear decision boundaries.
The first phase of this project uses SVMLIB [Chang, Lin] and was modified to allow acceleration via OpenCL API calls using the Vitis tool.
The initial goals of this project are to become familiar with the process of accelerating a C/C++ application/library using the Vitus tool. The SVM library was chosen due to the math-intensive nature of the algorithm with an eye towards retargeting the kernel to AIE Engines in the Versal ACAP devices as a follow-on project. In addition, the long-term goals of this project include the investigation of using SVM as an algorithm for the acceleration of QAM modulation using an RFSoC development board as the target platform following work done by Robert W Stewart. [Stewart et al.]
For a more detailed introduction to SVM, see [Gandhi] in the references section of this document.
The Ultra96 development board was used to test both software and accelerated versions of the SVM algorithm.
Creating an embedded acceleration application requires several steps each highly dependent on the other and each with its own set of rules and restrictions. Below is a list of each required phase, sub-phases, and tools required.
Application Acceleration/Kernel Development
I did this using a virtual machine running Ubuntu 126.96.36.199 so some of the instructions may be irrelevant if you are running on a native Linux box. If you are running Linux natively please disregard any steps that mention copying files from/to VM shared folders.
Setting up the Ultra96:
Build the ultra96 platform
During step 12-c. we added the svm_predict function into the binary container to be accelerated. In order to provide a comparison of the profile, this project was also built without accelerating the svm_predict function. The results of the unaccelerated test are in figure 4
Figure 5 demonstrates the software profile when the svm_predict function is mapped to hardware acceleration. As can be seen, the accelerated function actually takes more execution time than the non-accelerated version.
There are several reasons why targeting hardware does not always achieve the desired acceleration result. The most common are:
This tutorial covers profiling in a simple hardware acceleration case, where optimizations have not yet been applied. This will be followed on with details on how to optimize the SVM library code to achieve acceleration in hardware.
Nearly every phase of this design flow provided an opportunity to learn new skills and methods.
Early in the development cycle, it became obvious that a script or Make-based build process was going to be valuable for both repeatabilities as well as accelerating the iteration cycles. Therefore, a series of build scripts and Makefiles were generated for the Platform creation, Petalinux builds and Vitis software platform compilation.
Once the SVM code was ported to the A53s, the next step was to identify a portion of the algorithm that could easily be accelerated in the PL (Programmable Logic). Isolating the code for the kernel was relatively straight forward, but some of the Vitis software platform restrictions on useable C/C++ code required significant changes to the code. Some of these restrictions include removing ‘mallocs’ and changing structures to remove embedded pointers. Once this was completed, the final step prior to being able to accelerate the kernel was to modify the code to fit into an OpenCL framework. The OpenCL framework requirement represented one of the more important lessons learned. Data movement and memory allocation is something that needs to be planned for early in the design cycle in order to maximize acceleration.
The final challenge in this exercise was determining why the accelerated kernel was not returning the same results as the non-accelerated kernel. Low-level debugging of the accelerator is challenging as correlating software code to hardware wave diagrams and interfaces are not intuitively obvious. To debug this discrepancy, we inserted preprocessor directives to disable the OpenCL structures and run as much of the code with g++ as possible. This allowed the use of the same Makefiles to build both accelerated and non-accelerated versions thus further minimizing differences between the versions. After some analysis, we were able to make corrections to the C++ code and get correct values from the accelerated kernel.
Immediate next steps will be to accelerate a bigger part of the algorithm to realize a larger acceleration. Once that is accomplished, further work will be needed to explore adapting the SVM algorithm to the QPSK waveform and porting the design to the ZCU111 RFSoC board. In addition, we would like to investigate the benefits of utilizing the AIEs in Versal to accelerate SVM as this will be a requirement for DSP applications in the Versal family of devices.
The Vitis environment is a great platform for accelerating applications in PL and AIEs moving forward. The scriptable nature of the Vitis software platform is a very welcome addition to accelerating embedded systems and fits very nicely into the well-established software development environment. Debugging kernels is challenging but the introduction of the Vitis Analysis tool should ease this burden dramatically.
Embedded acceleration is a complex task requiring an intricate understanding of the interaction between hardware and software, but the benefit can be a significant improvement in performance and throughput. After working through the initial phase of this project, it is clear that the Vitis software platform is the tool to conquer these challenges.
[Chang, Lin] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
[Stewart et al.] Software Defined Radio with RFSoC & PYNQ: https://www.xilinx.com/support/documentation/university/XDF_2018/XDF_2018_Posters_ALL%2033.pdf
[Gandhi] Support Vector Machine – Introduction to Machine Learning Algorithms: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47
Don Schaeffer has been employed as a Xilinx FAE for over 12 years. His areas of expertise include Embedded Systems Design, Digital Signal Processing, and Analog-Digital Conversion. Prior to working for Xilinx, Don worked for 12 years as an ASIC/FPGA design engineer primarily designing wired and wireless communications systems for both military and commercial applications. When not geeking-out, Don enjoys spending time outdoors and traveling with his wife and kids.
Taylor has been with Xilinx since October 2017 covering the Oregon area. He has 12+ years total industry experience in hardware and ASIC design for test and measurement, consumer, industrial and video applications. In his spare time he enjoys spending time with his family, outdoors activities, and watching the Portland Timbers Football Club.