Lab: AXI4-Burst Mode (m_axi) ============================ Simple example of AXI4-Burst Mode *********************************** This lab is an example of AXI4 data transfer in burst mode. It takes in a given sample of values and provides the square root. 1) Vitis HLS: Generating RTL code from C/C++ code -------------------------------------------------- In this section you learn how to create a project in Vitis HLS, synthesis your code, and generate RTL. 1.1) Download code and create a Vitis HLS project ################################################## This is the code we will be using: .. code-block :: c++ #include "axi4_sqrt.hpp" #include #include void axi4_sqrt(float *in, float *out, int len) { #pragma HLS INTERFACE s_axilite port=return bundle=sqrt #pragma HLS INTERFACE s_axilite port=len bundle=sqrt #pragma HLS INTERFACE m_axi depth=50 port=out offset=slave bundle=output #pragma HLS INTERFACE m_axi depth=50 port=in offset=slave bundle=input #pragma HLS INTERFACE s_axilite port=in bundle=sqrt #pragma HLS INTERFACE s_axilite port=out bundle=sqrt float buff[100]; memcpy(buff, (const float*) in, len * sizeof(float)); for(int i = 0; i < len; i++) buff[i] = sqrt(buff[i]); memcpy(out, (const float*) buff, len * sizeof(float)); } Note that we had to include `string.h` to be able to use `memcpy`. Additionally, we use `memcpy` instead of a for-loop (as used in AXI-streaming) to force Vitis HLS to infer an AXI4-Burst. Sometimes Vitis HLS will not infer this from a for-loop, but will for `memcpy`. Also note that we had to set the bundles of `s_axilite` ports for `in` and `out` to the same bundle as the `return` port. This is required since Vitis HLS will otherwise assign the `m_axi` ports to a separate interface (instead of a single interface for all variables, which is easier to handle in Jupyter). Please see `this Xilinx docs `_ page for more information. Download and unzip `axi4_burst.zip `_ that contains the above code. The `.cpp` file in the zip file actually does not match with above code, so please update it with the above code. Generate your project using the provided script.tcl file: Linux: open a terminal, make sure your environment is set, navigate to streamMul folder, and run the following :: $ vitis_hls -f script.tcl Windows: open vitis_hls command line and run the following :: $ vitis_hls -f script.tcl Now you can open your project in Vitis HLS. It should look like this: .. image :: https://i.imgur.com/iHkVmWE.png 1.2) Generate RTL code and export it #################################### Before running C Synthesis, in Vitis HLS, go to the **Solution > Solution Settings**, then under the "General" tab, in `config_interface`, de-select the `m_axi_addr64` option (see screenshot below, specifically the row with the red cross). This is required because 64-bit addresses are used by default when `m_axi`` is used in the pragmas. And in Jupyter with Pynq, it is not clear how to write the values to 64 bit addresses. So de-selecting this option will ensure that 32-bit addresses are used. .. image :: https://i.imgur.com/o6gpaSv.png Click on Run **C Synthesis** to generate RTL code. After it is done, you can check your resource utilization and timing report. Your latency is unknown (?) because your loop size (*len*) is a variable. .. image :: https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/pynq19.png Note: you can also check the addresses of variables (that you need to write to in Jupyter) in the Synthesis summary under "S_AXILITE Registers". Now you can export your RTL code by clicking on **Export RTL**: .. image :: https://bitbucket.org/repo/x8q9Ed8/images/582121524-pynq3.png After exporting is done, you can close and exit from Vitis HLS. 2) Vivado: Generating bitstream from RTL code --------------------------------------------- In this section we import our RTL code from last section, add some required IPs, and generate our bitstream. 2.1) Create a new Vivado project ################################ Open your Vivado tool and create a new project. Select an appropriate location for your project and leave the default project name as is (**project_1**). Select **RTL Project** and check **Do not specify sources** at this time. Select **xc7z020clg400-1** for your part: .. image :: https://bitbucket.org/repo/x8q9Ed8/images/3090594305-pynq4.png 2.2) Import RTL code #################### Under **Flow Navigator**, click on **IP Catalog**. Right-click on the opened window and select **Add Repository**. Navigate to your **Vitis HLS project > solution1 > impl > ip** and select it: .. image :: https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/pynq20.png 2.3) Add IPs to your design ########################### Under **Flow Navigator**, click on **Create Block Design**. Leave the design name as is (*design_1*). In the newly opened window, you can add IPs by clicking on the plus sign. Add **ZYNQ7 Processing System** to your design: .. image :: https://bitbucket.org/repo/x8q9Ed8/images/3814633603-pynq6.png Double click on **ZYNQ7 IP** to customize it. In the opened window, double click on **High Performance AXI 32b/64b Slave Parts**: .. image :: https://bitbucket.org/repo/x8q9Ed8/images/148617913-pynq7.png Select and check **S AXI HP0 interface**: .. image :: https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/pynq21.png Add the **axi4_sqrt** IP to the design. .. image :: https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/pynq22.png 2.4) Automatic connections ########################## 1. Click on **Run Block Automation** 2. Click on **Run Connection Automation** and select all. Click OK. .. image :: https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/pynq23.png 3. Click on **Run Connection Automation** again and select all. Click OK. .. image :: https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/pynq24.png This is how the final design should look. Note: in Vivado 2023, it uses AXI Interconnect block instead of AXI SmartConnect, but it works. .. image :: https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/pynq25.png 2.5) Generate bitstream ####################### 1. Save your design **CTRL+S** or **File > Save Block Design**. 2. Validate your design: **Tools > Validate Design**. 3. In Sources, right click on **design_1**, and **Create HDL Wrapper**. Now you should have **design_1_wrapper**. 4. Generate bitstream by clicking on **Generate Bitstream** in **Program and Debug**. 2.6) Post bitstream Generation ############################## In sources, expand **design_1_wrapper::design_1_i::design_1::axi4_sqrt_0::design_1_axi4_sqrt_0_0::inst : axi4_sqrt**, double click on **axi4_sqrt_sqrt_s_axi_U** , and note the address for **in_r** , **out_r** , **len** as **0x10** , **0x18** and **0x20** respectively. We need this addresses in our host program. These addresses can also be found in the Vitis HLS C Synthesis summary. .. image :: https://github.com/KastnerRG/pp4fpgas/raw/master/labs/images/pynq26.png You can close and exit the Vivado tool. Copy your **project directory > project_1 > project_1.runs > impl_1 > design_1_wrapper.bit** to your **project directory > project_1** and rename it to **axi4_sqrt.bit** Copy your **project directory > project_1 > project_1.gen > sources_1 > bd > design_1 > hw_handoff > design_1.hwh** to your **project directory > project_1** and rename it to **axi4_sqrt.hwh** These files need to have the same name (except for their file extension). 3) Host program --------------- In this section, we use Python to test our design. 3.1) Move your files #################### Create a new folder in your PYNQ board and move both **axi4_sqrt.hwh** and **axi4_sqrt.bit** into it. 3.2) Python code ################ Create a new Jupyter notebook and run the following code to test your design: .. code-block :: python3 from pynq import Overlay from pynq import Xlnk # replace with allocate for Pynq >= 2.7 import numpy as np ol = Overlay('axi4_sqrt.bit') sqrt_ip = ol.axi4_sqrt_0 # if you can't find the IP, type 'ol.' and hit Tab to see what options are available .. code-block :: python3 length = 40 inpt = Xlnk().cma_array(shape=(length,), dtype=np.float32) outpt = Xlnk().cma_array(shape=(length,), dtype=np.float32) a = [i*i for i in range(length)] np.copyto(inpt, a) soft_op = np.sqrt(inpt) sqrt_ip.write(0x20, length) sqrt_ip.write(0x10, inpt.physical_address) sqrt_ip.write(0x18, outpt.physical_address) sqrt_ip.write(0x00, 1) print("Hardware Output", "Software Output \n") for i in range(length): print(outpt[i], "\t\t ", soft_op[i])