前置准备
- perftest:RDMA性能测试工具
软件安装
yum install perftest
GPU 的P2P连接
使用 nvidia-smi topo -m
命令查看GPU的P2P连接情况
Apptainer> nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NODE SYS SYS 0-37,76-113 0 N/A
GPU1 NODE X SYS SYS 0-37,76-113 0 N/A
GPU2 SYS SYS X NODE 38-75,114-151 1 N/A
GPU3 SYS SYS NODE X 38-75,114-151 1 N/A
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
# 根据输出结果可知,GPU0与GPU1相连接,GPU2与GPU3相连接。
RDMA性能测试
RDMA性能测试工具集-perftest
ib_send_lat 延迟测试发送事务(latency test with send transactions)
ib_send_bw 带宽测试发送事务(bandwidth test with send transactions)
ib_write_lat 使用RDMA写事务进行延迟测试(latency test with RDMA write transactions)
ib_write_bw 带宽测试与RDMA写事务(bandwidth test with RDMA write transactions)
ib_read_lat 使用RDMA读事务进行延迟测试(latency test with RDMA read transactions)
ib_read_bw 带宽测试与RDMA读事务(bandwidth test with RDMA read transactions)
ib_atomic_lat 使用原子事务进行延迟测试(latency test with atomic transactions)
ib_atomic_bw 使用原子事务进行带宽测试(bandwidth test with atomic transactions)
注意,性能测试时,注意cpu、内存等是否会成为瓶颈。
ib_send_bw/ib_write_bw(带宽测试)
基本用法:
在A服务器上运行:
ib_send_bw -a -F -d hfi1_0
# -a 参数可递增测试出最大带宽的msg size
# hfi1_0 是A服务器的上的device, 可以通过 ibstatus 命令进行查询
在B服务器上运行:
ib_send_bw -a -F -d hfi1_0 192.168.112.22
# hfi1_0 B服务器上IP网段为192.168.112.xxx的device, 可以通过 ibstatus 命令进行查询
# 最后的ip是A服务器相同网段的ip
示例:
[root@c6 ~]# ib_send_bw -a -F -d hfi1_0
......
[root@c5 ~]# ib_send_bw -a -F -d hfi1_0 192.168.112.22
---------------------------------------------------------------------------------------
RDMA_Write BW Test
Dual-port : OFF Device : hfi1_0
Number of qps : 1 Transport type : IB
Connection type : RC Using SRQ : OFF
PCIe relax order: ON
ibv_wr* API : OFF
TX depth : 128
CQ Moderation : 100
Mtu : 4096[B]
Link type : IB
Max inline data : 0[B]
rdma_cm QPs : OFF
Data ex. method : Ethernet
---------------------------------------------------------------------------------------
local address: LID 0x0c QPN 0x003e PSN 0xb31ee0 RKey 0x8c8d00 VAddr 0x0014a81bb87000
remote address: LID 0x0e QPN 0x003e PSN 0xc97b86 RKey 0x8c8d00 VAddr 0x0014e94b2ff000
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
2 5000 1.18 1.11 0.579452
4 5000 2.05 1.97 0.517667
8 5000 4.02 3.98 0.522262
16 5000 8.14 7.96 0.521719
32 5000 19.36 17.53 0.574268
64 5000 32.15 31.80 0.521064
128 5000 63.47 62.68 0.513446
256 5000 124.11 120.75 0.494573
512 5000 408.21 377.21 0.772536
1024 5000 811.05 774.87 0.793469
2048 5000 1470.13 1445.19 0.739936
4096 5000 2803.63 2689.07 0.688403
8192 5000 3280.03 3258.14 0.417042
16384 5000 3277.16 3276.55 0.209699
32768 5000 3291.12 3289.98 0.105279
65536 5000 3299.75 3299.69 0.052795
131072 5000 3298.96 3298.95 0.026392
262144 5000 3306.67 3306.66 0.013227
524288 5000 3307.30 3307.29 0.006615
1048576 5000 3307.21 3307.21 0.003307
2097152 5000 3305.91 3305.91 0.001653
4194304 5000 3307.09 3307.00 0.000827
8388608 5000 3307.72 3307.08 0.000413
---------------------------------------------------------------------------------------
ib_send_lat/ib_write_lat (时延)
用法同ib_send_bw/ib_write_bw
。
[root@c6 ~]# ib_write_lat -a -F -d hfi1_0
[root@c5 ~]# ib_write_lat -a -F -d hfi1_0 192.168.112.22