Executive Summary

This report discusses the steps required to accelerate a SVM (Support Vector Machine) written in ‘C’ code on a Xilinx® Zynq® UltraScale+™FPGA using the Vitis™ unified software platform as an embedded application.

A Support Vector Machine is a versatile classification algorithm that ‘Kernelizes’ linear regression to model non-linear decision boundaries.

graphical-representation-of-a-hyperplane-separating-two-classes-in-3-dimensions-using-the-svm-algorithm

The first phase of this project uses SVMLIB [Chang, Lin] and was modified to allow acceleration via OpenCL API calls using the Vitis tool.


Project Overview

The initial goals of this project are to become familiar with the process of accelerating a C/C++ application/library using the Vitus tool.  The SVM library was chosen due to the math-intensive nature of the algorithm with an eye towards retargeting the kernel to AIE Engines in the Versal ACAP devices as a follow-on project.  In addition, the long-term goals of this project include the investigation of using SVM as an algorithm for the acceleration of QAM modulation using an RFSoC development board as the target platform following work done by Robert W Stewart. [Stewart et al.]

For a more detailed introduction to SVM, see [Gandhi] in the references section of this document.

The Ultra96 development board was used to test both software and accelerated versions of the SVM algorithm.

ultra-96-from-avnet

Tool Flow

Creating an embedded acceleration application requires several steps each highly dependent on the other and each with its own set of rules and restrictions.  Below is a list of each required phase, sub-phases, and tools required.

Platform Generation

  • FPGA/IPI Creation :: Vivado
  • Petalinx/Linux Generation (with SYSROOT) :: Petalinux
  • Acceleration Platform Generation:: Vitis

Application Compilation/Profiling

  • Compile source code for embedded ARM processor(s) :: GCC/Vitis

Application Acceleration/Kernel Development

  • Modify ‘C/C++’ code identified via profiling for HLS/OpenCL
  • Implement Kernel in FPGA fabric (Vitis/HLS/Vivado)

Verification/Testing

  • Run the accelerated application on embedded hardware
  • Profiling and comparing to the original performance
Flow Diagram for Embedded Acceleration

Profiling

I did this using a virtual machine running Ubuntu 16.0.4.6 so some of the instructions may be irrelevant if you are running on a native Linux box.  If you are running Linux natively please disregard any steps that mention copying files from/to VM shared folders. 

Setting up the Ultra96:

Build the ultra96 platform

  1. Download ultra96_base platform from Vitis wiki onto a Linux machine (native or virtual).
    1. Decompress archive: tar -xzvf ultra96_base.tar.gz
    2. Cd ultra96_base
    3. Make
  2. Once the build completes:
    • Cd images/linux/
    • ./sdk.sh
      • This builds the sysroot for your linux image.  The sysroot contains many libraries, some of which are required for this application.
  3. Once the sysroot has been generated, copy the repo (or output directory if the sysroot generation script failed the last copy due to paths not matching exactly) into a windows/VM shared folder.
  4. Launch  the Vitis tool in the VM.
  5. Create a new workspace
  6. Select File->new-> Vitis application project
  7. When asked to select the platform, navigate to the workspace folder containing the ultra96 platform (ultra96.xpfm) and hit ok.
  8. The Ultra96 platform should now show up in the list of platforms to select. Select it.
  9. Set your sysroot path to /repo_base/sysroot/sysroots/aarch64-xilinx-linux
  10. Right click on the application project and select import sources
    • Navigate to the directory where you have copied or cloned your SVM sources and import the following:
      • svm_predict_values.cpp
      • svm.cpp
      • svm.h
      • svm-predict.cpp
      • xcl2.cpp
      • xcl2.hpp
      • heart_scale
      • heart_scale.model
      • svm.def
  11. set your build configuration to "system" in the Vitis GUI.
  12. Double click on the svm_predict.sdx project in the project explorer.
    • This opens a pane in the central part of the GUI which allows some configuration of your project.
    • One of the icons in this pane will allow you to add a binary container. 
    • Once the binary container is added in the Vitis environment, it will scan the project to see if there are any functions eligible for acceleration.  Select the svm_predict_values function once it shows up.
  13. Click on the hammer icon to build the project (this will take some time)
  14. Once complete, copy the contents of the sd_card directory from your project onto a micro SD card.
  15. Copy the binary_container_1.xclbin (names may vary) from the /workspace/<svm_project_name>/System folder and the heart_scale and heart_scale.model from the /src directory onto the sd card
  16. Plugin your Ultra96 board with the USB to JTAG/Serial adapter card connected to the Ultra96 board edge pin header and to your PC through a USB cable (add link to Avnet USB/serial board).
  17. Ensure the Ultra96 boot mode is set to SD card mode (SW2)
  18. Plug in the micro SD card with the boot image on it
  19. Turn on the Ultra96 board using the sw3 push button.
  20. In the Vitis GUI SDX terminal pane, click on the + icon to add a serial terminal for the Ultra96 board.  Select one of the serial ports available and check to see if status messages are being displayed in the terminal (in my case this was USB1).
    • Serial port settings should be: baud rate 115200, 8 bit, 1 stop bit, no parity, no flow control
    • Use trial and error to determine which USB/serial port to use.  The Ultra96 will be periodically printing status messages to the active port.
  21. Once the Ultra96 has booted, connect to the SSID it is broadcasting via WiFi.  You should be able to ping the host (Ultra96) at this point.
    • This step is necessary because the profiler we are using (TCF) requires an ethernet connection to communicate. 
  22. You may be able to do the following steps from the Vitis tool terminal, but you can also use any SSH terminal (I used Putty). 
    • Log in to the host via SSH with username: root, password: root
    • Copy the heart_scale, heart_scale.model and binary_container_1.xclbin files from the /media/card/ directory into the /mnt/ directory
      • This step moves the necessary files into the folder the TCF (Target Communications Framework) agent will be executing from.  The TCF agent generates the profiling reports used later in this tutorial to evaluate the execution time of each of the subroutines in the program.
  23. In the Target Connections pane, double click TCF agent->Linux agent
    • Change the IP address to the Ultra96 host IP address (likely 192.168.2.1, but may vary). 
    • Use the test connection button to verify that you are connected to the TCF agent.
  24. In the Vitis GUI, right-click on your application project and select debug as->launch on hardware (system debugger).  Click OK on the prompt to switch to the debug perspective
  25. In the Vitis GUI, navigate to window->show view->debug and select the TCF profiler option.  This will enable you to do profiling and stack tracing.
  26. Once the source code pane opens up, scroll to the bottom and set a breakpoint on the return 0 lines.  This prevents the program from exiting and terminating the TCF agent once it is complete.
  27. In the TCF profiler pane, select the start button.
    • Check the "enable stack tracing" box
  28. Select the resume button (F8) from the icon bar at the top of the GUI.
  29. Once the program runs, you should see the profile of the function execution as below:

During step 12-c. we added the svm_predict function into the binary container to be accelerated.  In order to provide a comparison of the profile, this project was also built without accelerating the svm_predict function.  The results of the unaccelerated test are in figure 4

Non-accelerated run

 Figure 5 demonstrates the software profile when the svm_predict function is mapped to hardware acceleration.  As can be seen, the accelerated function actually takes more execution time than the non-accelerated version.

Accelerated run

There are several reasons why targeting hardware does not always achieve the desired acceleration result.   The most common are:

  • Data movement is not optimized
    • Larger bursts of data improve the ability of the tools to optimize the hardware.  Handshaking and latency on smaller transactions lower the efficiency of the data movement.
  • The algorithm does not lend itself to optimization
    • Some algorithms are more suited to hardware acceleration, like large parallel math/vector operations and special purpose pipeline logic. 
    • Algorithms that may not see as much benefit from acceleration are ones that need to frequently flush their execution pipelines and retrieve new data from memory.   
  • Tool pragmas have not been used, or have not been used effectively, to define the interfaces, data movement, and acceleration targets. 

This tutorial covers profiling in a simple hardware acceleration case, where optimizations have not yet been applied.  This will be followed on with details on how to optimize the SVM library code to achieve acceleration in hardware.


Challenges and Lessons Learned

Nearly every phase of this design flow provided an opportunity to learn new skills and methods.

Early in the development cycle, it became obvious that a script or Make-based build process was going to be valuable for both repeatabilities as well as accelerating the iteration cycles.  Therefore, a series of build scripts and Makefiles were generated for the Platform creation, Petalinux builds and Vitis software platform compilation.

Once the SVM code was ported to the A53s, the next step was to identify a portion of the algorithm that could easily be accelerated in the PL (Programmable Logic).  Isolating the code for the kernel was relatively straight forward, but some of the Vitis software platform restrictions on useable C/C++ code required significant changes to the code.  Some of these restrictions include removing ‘mallocs’ and changing structures to remove embedded pointers.  Once this was completed, the final step prior to being able to accelerate the kernel was to modify the code to fit into an OpenCL framework.  The OpenCL framework requirement represented one of the more important lessons learned.  Data movement and memory allocation is something that needs to be planned for early in the design cycle in order to maximize acceleration.

The final challenge in this exercise was determining why the accelerated kernel was not returning the same results as the non-accelerated kernel.  Low-level debugging of the accelerator is challenging as correlating software code to hardware wave diagrams and interfaces are not intuitively obvious.  To debug this discrepancy, we inserted preprocessor directives to disable the OpenCL structures and run as much of the code with g++ as possible.  This allowed the use of the same Makefiles to build both accelerated and non-accelerated versions thus further minimizing differences between the versions.  After some analysis, we were able to make corrections to the C++ code and get correct values from the accelerated kernel.


Next Steps

Immediate next steps will be to accelerate a bigger part of the algorithm to realize a larger acceleration.  Once that is accomplished, further work will be needed to explore adapting the SVM algorithm to the QPSK waveform and porting the design to the ZCU111 RFSoC board.  In addition, we would like to investigate the benefits of utilizing the AIEs in Versal to accelerate SVM as this will be a requirement for DSP applications in the Versal family of devices.


Conclusion

The Vitis environment is a great platform for accelerating applications in PL and AIEs moving forward.  The scriptable nature of the Vitis software platform is a very welcome addition to accelerating embedded systems and fits very nicely into the well-established software development environment.  Debugging kernels is challenging but the introduction of the Vitis Analysis tool should ease this burden dramatically.

Embedded acceleration is a complex task requiring an intricate understanding of the interaction between hardware and software, but the benefit can be a significant improvement in performance and throughput.  After working through the initial phase of this project, it is clear that the Vitis software platform is the tool to conquer these challenges. 


Citations

[Chang, Lin] Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

[Stewart et al.] Software Defined Radio with RFSoC & PYNQ: /content/xilinx/en/support/documentation/university/XDF_2018/XDF_2018_Posters_ALL%2033.pdf

[Gandhi] Support Vector Machine – Introduction to Machine Learning Algorithms: https://towardsdatascience.com/support-vector-machine-introduction-to-machine-learning-algorithms-934a444fca47


About Don Schaeffer

About Don Schaeffer

Don Schaeffer has been employed as an AMD FAE for over 12 years.  His areas of expertise include Embedded Systems Design, Digital Signal Processing, and Analog-Digital Conversion.  Prior to working for AMD, Don worked for 12 years as an ASIC/FPGA design engineer primarily designing wired and wireless communications systems for both military and commercial applications.  When not geeking-out, Don enjoys spending time outdoors and traveling with his wife and kids.


About Taylor Maddix

About Taylor Maddix

Taylor has been with AMD since October 2017 covering the Oregon area.  He has 12+ years total industry experience in hardware and ASIC design for test and measurement, consumer, industrial and video applications.  In his spare time he enjoys spending time with his family, outdoors activities, and watching the Portland Timbers Football Club.