Vitis 笔记（一）

Albresky 收录于 Vitis AIE

2024-07-17 2025-07-25 - 次阅读 - 条评论

整理 Vitis 开发中学到的小 trick。

1. 编译选项

1.1 AIE Compiler FLAGS

启用 AIE 后端代码

--aie.Xchess=main:backend.mist2.xargs=-ggraph

1.2 HW-EMU FLAGS

启用 AIE Profile

-aie-sim-options $PRJ_DIR/aiesimulator_output/aiesim_options.txt

启用 AIE Trace

--aie.event-trace=runtime

启用 XSIM GUI

-g

AIE 仿真器获取无时间值的输出文件

--output-time-stamp=no

2. AIE 工具

2.1 波形查看

.vcd 转 .wdb

vcdanalyze -vcd=foo.vcd -wdb --pkg-dir=../prj_root/Work

3. AIE 仿真

3.1 PLIO 数据对齐

PLIO 端口支持 32bit、64bit 和 128 bit。假如数据宽度为 32 bit，那么对于不同宽度的 PLIO 端口，分别需要将仿真数据以1、2、4 个样本/行的方式组织在仿真数据文件中。

例如（为了方便表示，下面使用 HEX 编码，实际仿真时要使用 DEC 编码的 INTEGER ）：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


// 32 bit
0x00000001
0x00000002
0x00000003
0x00000004
0x00000005
0x00000006

// 64 bit
0x00000001 0x00000002
0x00000003 0x00000004
0x00000005 0x00000006

// 128 bit
0x00000001 0x00000002 0x00000003 0x00000004
0x00000005 0x00000006

当 PLIO 端口连接至 AIE 的 Packet-Split 时，Packets的最后一组应以 TLAST 信号为高电平进行标记（硬件）。在仿真中，TLAST 信号以字符串 TLAST 进行表示，与 Packet-Header 一起组织在仿真数据文件中。

假如 Packet-Header（32bit）为 0x80000001、Packet 大小为 4 个样本，那么不同 PLIO 宽度下的仿真数据文件应该如下组织：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


分隔符为空格
// 32 bit
0x80000001
0x00000001
0x00000002
0x00000003
TLAST
0x00000004
0x80000001
0x00000005
TLAST
0x00000006

// 64 bit
0x80000001 0x00000001
0x00000002 0x00000003
TLAST
0x00000004
0x80000001 0x00000005
TLAST
0x00000006

// 128 bit
0x80000001 0x00000001 0x00000002 0x00000003
TLAST
0x00000004
0x80000001 0x00000005
TLAST
0x00000006

假如输入数据是 int16 类型的，那么在不同 PLIO 端口宽度时，仿真数据文件则应按如下方式组织：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


// 分隔符为空格
// 32 bit
0x80000001
0x0001 0x0002
0x0003
TLAST
0x0004
0x80000001
0x0005
TLAST
0x0006 0x0007

// 64 bit
0x80000001 0x0001 0x0002
0x0003
TLAST
0x0004
0x80000001 0x0005
TLAST
0x0006 0x0007

// 128 bit
0x80000001 0x0001 0x0002 0x0003
TLAST
0x0004
0x80000001 0x0005
TLAST
0x0006 0x0007 0x0008 0x0009 0x000a 0x000b 0x000c 0x000d

总结： 每行的样本数量为 PLIO 端口宽度 // 数据宽度，TLAST 标记单独另起一行。

同理，当 AIE Kernel 的输出数据为 int16(int8 类似)且使用 pktmerge 进行包串流时，输出到PL的数据会将 int16 类型数据按前后顺序两两组织成一个 int32 类型的数据（此处注意，如果是 int8 类型数据，则会每 4 个按先后顺序合并成一个 int32 类型的数据。注：pktstream 仅支持 int32 类型数据）

下面一个示例（PLIO 端口宽度为 128bit）可以直观解释：

1
2
3
4
5
6
7
8
9


// 假设某 AIE 内核输出 2 组 aie::vector<int16, 8> 的数据：
0x0001 0x0002 0x0003 0x0004 0x0005 0x0006 0x0007 0x0008
0x0009 0x000a 0x000b 0x000c 0x000d 0x000e 0x000f 0x0010

// 那么经过 pktmerge 之后，该内核从 PLIO 输出至 PL 的数据将按以下形式组织：
0x8f000000 0x00010002 0x00030004 0x00050006
0x00070008 0x0009000a 0x000b000c 0x000d000e
TLAST
0x000f0010

0x8f000000 为 pktmerge 自动添加的包头 ID。

参考：AI Engine Development /Buffer-based AI Engine Kernels
cascade_stream 无法指定 FIFO 深度。

1

'adf::address': The third parameter (offset value) should be 32-byte aligned.

4. ADF | Graph

4.1 ADF 端口连接规则

pktstream port 只能和 buffer(window)或 graph port 相连

4.2 ADF 端口访问规则

关于 plio_input/plio_output 、adf::port 和 adf::pktsplit/adf::pktmerge 的数组下标访问

plio_input（以 plio_input in[2] 为例）
- 其输出使用 in[0].out[0]、in[1].out[0] 进行访问
plio_output（以 plio_output out[2] 为例）
- 其输入使用 out[0].in[0]、out[1].in[0] 进行访问
adf::port （以 adf::port<input> in[2] 为例，输出端口同理)
- 其输入/输出（相对不同 Graph 而言）使用 in[0]、in[1] 进行访问。
adf::pktsplit（以 adf::pktsplit<2> sp 为例）
- 其输入使用 sp.in[0] 进行访问，其输出使用 sp.out[0], sp.out[1] 进行访问
adf::pktmerge（以 adf::pktmerge<2> mg 为例）
- 其输出使用 mg.out[0] 进行访问，其输入使用 mg.in[0], mg.in[1] 进行访问

下面举例说明：

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77


// kernel graph with adf::port (Base Class) only
class FooKnlGraph : public adf::graph {
private:
  adf::kernel knl;

public:
  adf::port<input> in[2];
  adf::port<output> out[2];

  FooKnlGraph() {
    // initialize kernel and other stuff
    ...

    // ADF connections
    adf::connect<>(in[0], knl.in[0]);   // 对于 input port, 没有子元素索引
    adf::connect<>(in[1], knl.in[1]);
    adf::connect<>(knl.out[0], out[0]); // 同理，output port, 也没有子元素索引
    adf::connect<>(knl.out[1], out[1]);
  }
};

// graph with pktsplit and pktmerge
class WrapperKnlGraph : public adf::graph {
private:
  FooKnlGraph foo[2];

  adf::pktsplit<2> sp[2];
  adf::pktmerge<2> mg[2];

public:
  adf::port<input> in[2];
  adf::port<output> out[2];

  WrapperKnlGraph() {
    // initialize FooKnlGraph
    ...

    // ADF connections
    adf::connect<>(in[0], sp[0].in[0]);
    adf::connect<>(in[1], sp[1].in[0]);
    adf::connect<>(mg[0].out[0], out[0]);
    adf::connect<>(mg[1].out[0], out[1]);

    // connect FooKnlGraph
    adf::connect<>(sp[0].out[0], foo[0].in[0]);
    adf::connect<>(sp[0].out[1], foo[0].in[1]);
    adf::connect<>(sp[1].out[0], foo[1].in[0]);
    adf::connect<>(sp[1].out[1], foo[1].in[1]);
  }
};


// Top Graph with plio_input and plio_output
class TopGraph : public adf::graph {
private:
  WrapperKnlGraph wrapper;

public:
  input_plio in[2];
  output_plio out[2];

  TopGraph() {
    // initialize WrapperKnlGraph
    ...

    // initialize PLIOs and other stuff
    ...

    // adf connections
    // input_plio/output_plio 有成员变量 in[]、out[]，故需要索引其元素
    adf::connect<>(in[0].out[0], wrapper.in[0]);    
    adf::connect<>(in[1].out[0], wrapper.in[1]);
    adf::connect<>(wrapper.out[0], out[0].in[0]);
    adf::connect<>(wrapper.out[1], out[1].in[0]);
  }

};

5. 性能分析 | 工具指南

5.1 PDM

5.2 BEAM

5.3 (Onboard) Jupyter Notebook

给作者倒杯卡布奇诺 ~

赞赏

支付宝

微信