Overview

In a previous article (https://developer.xilinx.com/en/articles/containerizing-alveo-accelerated-application-with-docker.html) we showed how you can containerize Alveo Applications using Docker.

Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery.

In this article, we will show how we can use Alveo enabled containers in a kubernetes environment.


Local Kubernetes Network

For this example, we have set up a local environment with two servers. Server 1 (kubemaster) has two Alveo U200 cards installed, server 2 (kubeworker01) has one Alveo U200 card installed.

kubernets

This article is not meant to provide an overview on how to configure a kubernetes cluster, but this is a short overview of what was done:

  1. We installed CentOS 7.6 on both of the machines
  2. Installed the necessary XRT and deployment shells on both of the servers. I used the 2019.2 version from https://www.xilinx.com/products/boards-and-kits/alveo/package-files-archive/u200-2019-2.html
  3. Then we followed https://www.linuxtechi.com/install-kubernetes-1-7-centos7-rhel7/ to install kubernetes on our machines
    1. Since this is a small 2 machine cluster, I also configured the kubemaster as a worker node:

    kubectl taint nodes --all node-role.kubernetes.io/master-

After this procedure, we had kubernetes running on our own small network.


Installing the Alveo Device Plugin

To add support to kubernetes to manage Alveo cards, we install the Xilinx FPGA device plugin: https://github.com/Xilinx/FPGA_as_a_Service/tree/master/k8s-fpga-device-plugin/trunk.

The Xilinx FPGA device plugin for Kubernetes is a Daemonset deployed on the kubernetes(a.k.a k8s) cluster which allows you to:

  • Discover the FPGAs inserted in each node of the cluster and expose info of the FPGAs such as quantities, DSA(shell) type, and timestamp, etc
  • Run FPGA accessible containers in the k8s cluster

Installing this plugin is very simple. On the kubemaster machine, do this:

    git clone https://github.com/Xilinx/FPGA_as_a_Service.git
cd k8s-fpga-device-plugin/trunk/
kubectl create -f fpga-device-plugin.yml

This will install the daemonset on all nodes, so in our setup, on both kubemaster and kubeworker01. This can be verified using the following:

    kubectl get pod -n kube-system
 
...snippet...
 
fpga-device-plugin-daemonset-cgq9d   1/1     Running   0          15d
fpga-device-plugin-daemonset-fq689   1/1     Running   0          15d
 
...snippet...

If there is an Alveo card installed on the node, more info can be seen like this:

    kubectl logs fpga-device-plugin-daemonset-fq689 -n kube-system
 
time="2019-04-25T18:22:55Z" level=info msg="Starting FS watcher."
time="2019-04-25T18:22:55Z" level=info msg="Starting OS watcher."
time="2019-04-25T18:22:55Z" level=info msg="Starting to serve on /var/lib/kubelet/device-plugins/xilinx_u200_xdma_201820_1-1535712995-fpga.sock"
2019/04/25 18:22:55 grpc: Server.Serve failed to create ServerTransport:  connection error: desc = "transport: write unix /var/lib/kubelet/device-plugins/xilinx_u200_xdma_201820_1-1535712995-fpga.sock->@: write: broken pipe"
time="2019-04-25T18:22:55Z" level=info msg="Registered device plugin with Kubelet xilinx.com/fpga-xilinx_u200_xdma_201820_1-1535712995"
time="2019-04-25T18:22:55Z" level=info msg="Sending 1 device(s) [&Device{ID:1,Health:Healthy,}] to kubelet"
time="2019-04-25T18:32:06Z" level=info msg="Receiving request 1"
time="2019-05-09T18:36:41Z" level=info msg="Receiving request 1"

To check the Alveo resource in the worker node, run this:

    kubectl describe node kubemaster
 
...snippet...
 
Capacity:
 cpu:                                                    4
 ephemeral-storage:                                      102685624Ki
 hugepages-1Gi:                                          0
 hugepages-2Mi:                                          0
 memory:                                                 16425412Ki
 pods:                                                   110
 xilinx.com/fpga-xilinx_vcu1525_dynamic_5_1-1521279439:  1
Allocatable:
 cpu:                                                    4
 ephemeral-storage:                                      94635070922
 hugepages-1Gi:                                          0
 hugepages-2Mi:                                          0
 memory:                                                 16323012Ki
 pods:                                                   110
 xilinx.com/fpga-xilinx_u200_xdma_201830_2-1561465320:   1
 
...snippet...

Running Kubernetes Jobs using Alveo

The Xilinx Alveo resources all have a name with following format:

    xilinx.com/fpga-<shell>-<timestamp>

For example:

    xilinx.com/fpga-xilinx_u200_xdma_201830_2-1561465320

Here, xilinx_u200_xdma_201830_2 is the shell (DSA) version on the Alveo board, and 1561465320 is the timestamp when the shell was built.

The exact name of the Alveo resource on each node can be extracted from the output of:

    kubectl describe node <node_name>

Deploy user pod

Here is an example of the yaml file which defines the pod to be deployed. In the yaml file, the docker image, which has been uploaded to a docker registry, should be specified. What should be specified as well is, the type and number of Alveo resources being used by the pod.

    $cat dp-pod.yaml
 
apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
  - name: mypod
    image: xilinxatg/fpga-verify:latest
  resources:
    limits:
      xilinx.com/fpga-xilinx_u200_xdma_201830_2-1561465320: 1
  command: ["/bin/sh"]
  args: ["-c", "while true; do echo hello; sleep 5;done;"]

This pod can then be deployed as follows:

    kubectl create -f dp-pod.yaml

To check the status of the deployed pod:

    $kubectl get pod
 
...snippet...
 
mypod                           1/1     Running   0          7s
 
...snippet...
 
$kubectl describe pod mypod
 
...snippet...
 
Limits:
      xilinx.com/fpga-xilinx_u200_xdma_201820_1-1535712995: 1
    Requests:
      xilinx.com/fpga-xilinx_u200_xdma_201820_1-1535712995: 1
 
...snippet...

To run hello world in the pod:

    $kubectl exec -it mypod /bin/bash
my-pod>source /opt/xilxinx/xrt/setup.sh
my-pod>xbutil scan
my-pod>cd /tmp/alveo-u200/xilinx_u200_xdma_201830_1/test/
my-pod>./verify.exe ./verify.xclbin

In this test case, the container image (xilinxatg/fgpa-verify:latest) has been pushed to docker hub. It can be publicly accessed

The image contains verify.xclbin for many types of FPGA, please select the type matching the FPGA resource the pod requests.

To run a more useful application, we pushed a couple of the compression examples of the Vitis Libraries (https://github.com/Xilinx/Vitis_Libraries) compiled for the U200 with XRT 2019.2 and DSA 2018.3_2 to dockerhub:

Both of them contained in the /root folder the xclbin, and the elf file, as well as a simple test.sh script. To run the my_zlib application as a kubernetes job, with for example 5 instances in parallel, and each job requiring 2 U200 cards, this yaml file can be used:

    apiVersion: batch/v1
kind: Job
metadata:
  name: myzlib
spec:
  completions: 5
  parallelism: 5
  template:
    spec:
      containers:
      - name: myzlib
        image: kesteraernoudt/my_zlib:latest
        resources:
          limits:
            xilinx.com/fpga-xilinx_u200_xdma_201830_2-1561465320: 2
        command: ["/bin/sh"]
        args: ["-c", "/root/test.sh"]
      restartPolicy: Never

This job can be started using

    kubectl apply -f my_zlib_job.yaml

Getting status information for this job can be done as follows:

    $ kubectl get pods
NAME           READY   STATUS        RESTARTS   AGE
myzlib-2m9s2   0/1     Completed     0          105s
myzlib-9l4jz   0/1     Completed     0          105s
myzlib-p2grp   0/1     Completed     0          105s
myzlib-rklm4   0/1     Completed     0          105s
myzlib-wv4s4   1/1     Running       0          105s

Once all finished, the job can be deleted:

    kubectl delete jobs/myzlib

Conclusion

In this article, we showed how Alveo cards can be used within a kubernetes environment. They are managed by kubernetes as a resource and can be requested by pods just like any other resource.


About Kester Aernoudt

About Kester Aernoudt

Kester Aernoudt received his masters degree in Computer Science at the University of Ghent in 2002. In 2002 he started as a Research Engineer in the Technology Center of Barco where he worked on a wide range of processing platforms such as microcontrollers, DSP's, embedded processors, FPGA's, Multi Core CPU's, GPU's etc. Since 2011 he joined Xilinx, now working as a Processor Specialist covering Europe and Israel supporting customers and colleagues on Embedded Processors, X86 Acceleration, specifically targeting our Zynq Devices and Alveo.