Lab: Axistream Single DMA (axis)

Simple streaming example using AXI

In this example we learn how to use Xilinx AXI_DMA to create streaming interfaces for and IP.

This class will not go too deep into AXI protocols and Vivado, but a nice tutorial of the AXI Direct Memory Access (DMA) exists here.

1) Vivado HLS: Generating RTL code from C/C++ code

In this section you learn how to create a project in Vivado HLS, synthesize your code, and generate RTL.

1.1) Download code and create a Vivado HLS project

Download and unzip axi_single_dma_files.zip. Generate your project using the provided script.tcl file:

Linux: open a terminal, make sure your environment is set, navigate to streamMul folder, and run the following

$ vitis_hls script.tcl

Windows: Open Vitis and create a New Project and import streamMul.cpp and streamMul.hpp and set smul as the top function.

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/manual_add.png

Your code is not complete!, modify your code to become same as the following:

#include "ap_axi_sdata.h"
#include "hls_stream.h"

typedef ap_axiu<32, 0, 0, 0> trans_pkt;

void smul(hls::stream< trans_pkt > &INPUT, hls::stream< trans_pkt > &OUTPUT)
{
#pragma HLS INTERFACE s_axilite port = return bundle = CTRL

                #pragma HLS INTERFACE axis port=INPUT
                #pragma HLS INTERFACE axis port=OUTPUT
                trans_pkt data_p;

                INPUT.read(data_p);
                data_p.data *= 2;
                OUTPUT.write(data_p);
}

In this lab, since we are using an ap_axiu struct for out I/O variables, the last bit is handled for us. We must interact with them this way because we are dealing with an AXI stream, not an array.

1.2) Generate RTL code and export it

Click on Run C Synthesis to generate RTL code. After it is done, you can check your resource utilization and timing report.

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/axi_single_dma/hls1.png

Check your hardware interface as follows:

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/axi_single_dma/hls2.png

Now you can export your RTL code by clicking on Export RTL:

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/axi_single_dma/hls3.png

After exporting is done, you can close and exit from Vitis.

2) Vivado: Generating bitstream from RTL code

In this section we import our RTL code from last section, add some required IPs, and generate our bitstream

2.1) Create a new Vivado project

Open your Vivado tool and create a new project. Select an appropriate location for your project and leave the default project name as is (project_1).

Select RTL Project and check Do specify not sources at this time.

Select xc7z020clg400-1 for your part:

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/axi_single_dma/vivado2.png

2.2) Import RTL code

Under Flow Navigator, click on IP Catalog. Right click on the opened window and select Add Repository. Navigate to your Vivado HLS project > solution1 > impl > ip and select it:

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/axi_single_dma/vivado3.png

2.3) Add IPs to your design

Under Flow Navigator, click on Create Block Design. Leave the design name as is (design_1). In the newly opened window you can add IPs by clicking on the plus sign.

Add ZYNQ7 Processing System to your design:

https://bitbucket.org/repo/x8q9Ed8/images/3814633603-pynq6.png

Double click on ZYNQ7 IP to customize it. In the opened window, double click on High Performance AXI 32b/64b Slave Parts:

https://bitbucket.org/repo/x8q9Ed8/images/148617913-pynq7.png

Select and check S AXI HP0 interface:

https://bitbucket.org/repo/x8q9Ed8/images/3126944786-pynq8.png

Add a Smul to your design and rename it to smul:

Add a AXI Direct Memory Access to your design and rename it to smul_dma.

https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/dma5.png

Double click on your AXI DMA and change the following parameters: 1) uncheck Enable Scatter Gather Engine. 2) Change Width of Buffer Length Register to 23:

https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/dma6.png

2.4) Manual connections

Connect the following ports:

smul::OUTPUT_r to smul_dma::S_AXIS_S2MM

smul_dma::M_AXIS_MM2S to smul::INPUT_r

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/axi_single_dma/vivado5.png

2.5) Automatic connections

Now you can leave the rest of the connections to the tool. There should be a highlighted strip on top of your diagram window.

Click on Run Block Automation
Click on Run Connection Automation and select all

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/axi_single_dma/vivado6.png

IMPORTANT! you have to click again on Run Connection Automation

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/axi_single_dma/vivado7.png

At this point your design should look like this:

https://github.com/KastnerRG/Read_the_docs/raw/master/docs/image/axi_single_dma/vivado8.png

2.6) Generate bitstream

Save your design CTRL+S or File > Save Block Design.
Validate your design: Tools > Validate Design
In Sources, right click on design_1, and Create HDL Wrapper. (use default options) Now you should have design_1_wrapper.
Generate bitstream by clicking on Generate Bitstream in Flow Navigator

Copy your project directory > project_1 > project_1.runs > impl_1 > design_1_wrapper to your project directory > project_1 and rename it to smul.bit. In my case, axi_single_dma > axi_single_dma.runs >impl_1 > design_1_wrapper.bit**

Copy your project directory > project_1 > project_1.srcs > sources_1 > bd > design_1 > hw_handoff > design_1.hwh to your project directory > project_1 and rename it to smul.hwh.

You should have both smul.bit and smul.hwh.

You can close and exit from Vivado tool.

3) Host program

In this section we use Python to test our design.

3.1) Move your files

Create a new folder (lab2_single_dma) in your PYNQ board and move both smul.bit and smul.hwh into it. Also create a notebook named smul_axi_single_dma.ipynb.

3.2) Python code

Create a new Jupyter notebook and run the following code to test your design:

import time
    from pynq import Overlay
    import pynq.lib.dma
    import numpy as np
    from pynq import MMIO
    import random

    print("Programming the FPGA")
    ol = Overlay('./smul.bit') # check your path
    ol.download() # it downloads your bit to FPGA

    print("Inspect all the IP names")
    ol.ip_dict.keys()

print("Inspect the HLS IP registers")
hls_ip = ol.smul
print(hls_ip)
# hls_ip.register_map


# Setup recv/send DMA
dma = ol.smul_dma
dma_send = ol.smul_dma.sendchannel
dma_recv = ol.smul_dma.recvchannel

#dma = ol.axi_dma_0
#dma_send = ol.axi_dma_0.sendchannel
#dma_recv = ol.axi_dma_0.recvchannel

print("Starting HLS IP")
hls_ip.register_map
CONTROL_REGISTER = 0x0
hls_ip.write(CONTROL_REGISTER, 0x81) # 0x81 will set bit 0
# hls_ip.register_map

# Prepare data to send
from pynq import allocate
import numpy as np

data_size = 20
input_buffer = allocate(shape=(data_size,), dtype=np.uint32)
output_buffer = allocate(shape=(data_size,), dtype=np.uint32)

for i in range(data_size):
        input_buffer[i] = i

print("Starting DMA transfer")
dma_recv.transfer(output_buffer)
dma_send.transfer(input_buffer)

dma_send.wait()
dma_recv.wait()

# Print the data
for i in range(data_size):
        #print('0x' + format(output_buffer[i], '02x'))
        print(i, input_buffer[i], output_buffer[i])

del input_buffer
del output_buffer
del ol
print("Finished!")