Honors Computer Science Bachelor's Thesis by Emerson Ford at the University of Utah.
ansible
an ansible playbook to setup hosts for testing various RDMA-in-container solutions.bg_presentation
background presentation on the topic of this thesisdata
raw data gathered for each RDMA-in-container solution testFreeflow
submodule to a fork of Freeflow, which required a few alterations to get working and includes some QoL changes to make testing fasterpaper
actual document in LaTeX for this thesis
This assumes you're using Cloudlab and have SSH keys configured on your Cloudlab account. You should also have ansible-playbook
and Python 3.10 installed.
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
d6515
.
- Change "Node type to use" to
- After the hosts have booted, upgrade them to Ubuntu 21.10. This is pretty much just running
sudo do-release-upgrade
on the hosts. - Change the two hostnames of the
no_mlnx_ofed
group to your Cloudlab hostnames inansible/hosts.yaml
. Changeansible_user
undervars
to your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit no_mlnx_ofed site.yml
while in theansible
directory. - Run the commands listed in
data/softroce_*/metadata*
while in thetest_scripts
directory. Take care to replace the--host1
and--host2
flags to match your Cloudlab hostnames, and--user
to match your Cloudlab username. The first argument (/opt/homebrew/.../Python
) should be replaced with the path to your Python 3.10 binary.- Run the host (i.e. non-SoftRoCE) versions of the test first.
- Then, for running the SoftRoCE version of
run_basic_tests
andrun_cpu_tests
, a SoftRoCE NIC must be manually configured before running the tests. SSH into both hosts and runsudo modprobe -rv mlx5_ib && sudo reboot now
. After they reboot, runsudo rdma link | grep rxe0 || sudo rdma link add rxe0 type rxe netdev enp65s0f0np0 && sudo devlink dev param set pci/0000:41:00.0 name enable_roce value false cmode driverinit && sudo devlink dev reload pci/0000:41:00.0
. Then run the test commands.
- The data should appear in
data/raw
. You can generate graphs based on the data you just produced by settingMODE = "softroce"
and rerunning all cells in the*.ipynb
Jupyter notebooks (runjupyter-notebook
while in thedata
dir).
- Ubuntu 21.10 / Linux Kernel >5.13 is required to avoid certain kernel panics when using SoftRoCE.
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
d6515
.
- Change "Node type to use" to
- Change the two hostnames of the
connectx5
group to your Cloudlab hostnames inansible/hosts.yaml
. Changeansible_user
undervars:
to your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 site.yml
while in theansible
directory. - Run the commands listed in
data/shared_hca_*/metadata*
while in thetest_scripts
directory. Take care to replace the--host1
and--host2
flags to match your Cloudlab hostnames, and--user
to match your Cloudlab username. The first argument (/opt/homebrew/.../Python
) should be replaced with the path to your Python 3.10 binary.- For running the Shared HCA version of
run_basic_tests
andrun_cpu_tests
, a Docker macvlan network must first be created withdocker network ls | grep mynet || docker network create -d macvlan --subnet=192.168.1.0/24 -o parent=ens3f0 -o macvlan_mode=private mynet
.
- For running the Shared HCA version of
- The data should appear in
data/raw
. You can generate graphs based on the data you just produced by settingMODE = "shared_hca"
and rerunning all cells in the*.ipynb
Jupyter notebooks (runjupyter-notebook
while in thedata
dir).
- The RDMA GID table is namespaced inside of the container, thus the majority of GID entries are
0000:0000:0000:0000:0000:0000:0000:0000
, and the ones namespaced into the container's namespace are populated. Despite this,ib_[read|write|send]_[bw|lat]
do not select the proper GID and will error withFailed to modify QP XXXX to RTR
andUnable to Connect the HCA's through the link
. You can forceib_[read|write|send]_[bw|lat]
to use the correct GID entry with the-x
flag. See/sys/class/infiniband/*/ports/*/gids
for GID entry values and/sys/class/infiniband/*/ports/*/gid_attrs/types
for the corresponding type (RoCE v1, RoCE v2, etc).- You can use
rdma_cm
queue pairs to avoid this with the-R
flag. However, using RDMA connection manager queue pairs results in 100% CPU utilization on theib_[read|write|send]_[bw|lat]
server (which should have around a 0% CPU util for read/write operations), thus their use can result in incorrect CPU usage readings.
- You can use
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
d6515
.
- Change "Node type to use" to
- Change the two hostnames of the
connectx5
group to your Cloudlab hostnames inansible/hosts.yaml
. Changeansible_user
undervars:
to your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 site.yml
while in theansible
directory. - Run the commands listed in
data/sriov_basic_tests/metadata_host
anddata/sriov_cpu_tests/metadata_host
while in thetest_scripts
directory. Take care to replace the--host1
and--host2
flags to match your Cloudlab hostnames, and--user
to match your Cloudlab username. The first argument (/opt/homebrew/.../Python
) should be replaced with the path to your Python 3.10 binary. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov site.yml
while in theansible
directory. This will provision the first SRIOV virtual function on both hosts. - Run the commands listed in
data/sriov_basic_tests/metadata_sriov
anddata/sriov_cpu_tests/metadata_sriov
while in thetest_scripts
directory. Take care to replace the--host1
and--host2
flags to match your Cloudlab hostnames, and--user
to match your Cloudlab username. The first argument (/opt/homebrew/.../Python
) should be replaced with the path to your Python 3.10 binary. - Run the commands listed in
data/sriov_multi_dev/metadata
while in thetest_scripts
directory. Take care to replace the--host1
and--host2
flags to match your Cloudlab hostnames, and--user
to match your Cloudlab username. The first argument (/opt/homebrew/.../Python
) should be replaced with the path to your Python 3.10 binary. - The data should appear in
data/raw
. You can generate graphs based on the data you just produced by settingMODE = "sriov"
and rerunning all cells in the*.ipynb
Jupyter notebooks (runjupyter-notebook
while in thedata
dir).
- SRIOV virtual function instantiation is really finicky. Sometimes it behaves and sometimes it doesn't. If your
basic_tests
orcpu_tests
don't work, reboot the host and rerunansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov site.yml
. Then rerun your tests.- The
multi_sriov_tests.py
script tries to handle the finickiness of SRIOV virtual functions, but after >20 times to get them to cooperate, it will fail the test.
- The
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
c6220
.
- Change "Node type to use" to
- Change the two hostnames of the
connectx3
group to your Cloudlab hostnames inansible/hosts.yaml
. Changeansible_user
undervars:
to your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx3 site.yml
while in theansible
directory. - Run the commands listed in
data/freeflow_*/metadata*
while in thetest_scripts
directory. Take care to replace the--host1
and--host2
flags to match your Cloudlab hostnames, and--user
to match your Cloudlab username. The first argument (/opt/homebrew/.../Python
) should be replaced with the path to your Python 3.10 binary. - The data should appear in
data/raw
. You can generate graphs based on the data you just produced by settingMODE = "freeflow"
and rerunning all cells in the*.ipynb
Jupyter notebooks (runjupyter-notebook
while in thedata
dir).
- RDMA's rkey generation is deterministic (see the ReDMArk paper), particularly on mlx4 NICs. Freeflow assumes unique rkeys per host as part of its rkey mapping scheme, which breaks with this deterministic generation. I added a patch to my fork of Freeflow to circumvent this, but if you run into
Failed status 10: wr_id 0 syndrom 0x88
errors, this is likely why. - Freeflow expects page-aligned memory, hence the use of
LD_PRELOAD=./align_malloc.so
. - Freeflow only supports mlx4 driver NICs, so you must use ConnectX-3 NICs.
- Freeflow provides a "no-fastpath" mode. However, this mode is prone to deadlocks at specific RDMA packet sizes and with more than 2 clients.
- Provision a Cloudlab experiment with the roce-cluster profile.
- Change "Node type to use" to
c6525-100g
.
- Change "Node type to use" to
- Change the two hostnames of the
connectx5
group to your Cloudlab hostnames inansible/hosts.yaml
. Changeansible_user
undervars:
to your Cloudlab username. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov,asap2_direct site.yml
while in theansible
directory. Reboot both hosts through the Cloudlab UI after installing MLNX OFED. Then rerun theansible-playbook
command to completion. - Run the commands listed in
data/asap2_direct_basic_tests/metadata_host
anddata/asap2_direct_cpu_tests/metadata_host
while in thetest_scripts
directory. Take care to replace the--host1
and--host2
flags to match your Cloudlab hostnames, and--user
to match your Cloudlab username. The first argument (/opt/homebrew/.../Python
) should be replaced with the path to your Python 3.10 binary. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov,asap2_direct -e "configure_sriov_ifs=false" site.yml
while in theansible
directory. This will provision the first SRIOV virtual function on both hosts and configure ASAP2 Direct. - Run the commands listed in
data/asap2_direct_basic_tests/metadata_sriov_ovs
anddata/asap2_direct_cpu_tests/metadata_sriov_ovs
while in thetest_scripts
directory. Take care to replace the--host1
and--host2
flags to match your Cloudlab hostnames, and--user
to match your Cloudlab username. The first argument (/opt/homebrew/.../Python
) should be replaced with the path to your Python 3.10 binary. - Run
ansible-playbook --ssh-common-args "-o StrictHostKeyChecking=no" --inventory-file="./hosts.yaml" --limit connectx5 --tags sriov,asap2_direct -e "NUM_OF_VFS=32" -e "configure_sriov_ifs=false" -e "cleanup_old_state=true" site.yml
while in theansible
directory. This will provision multiple VFs for the multi dev test. - Run the commands listed in
data/asap2_direct_multi_dev/metadata
while in thetest_scripts
directory. Take care to replace the--host1
and--host2
flags to match your Cloudlab hostnames, and--user
to match your Cloudlab username. The first argument (/opt/homebrew/.../Python
) should be replaced with the path to your Python 3.10 binary. - The data should appear in
data/raw
. You can generate graphs based on the data you just produced by settingMODE = "asap2_direct"
and rerunning all cells in the*.ipynb
Jupyter notebooks (runjupyter-notebook
while in thedata
dir).
- When using
switchdev
, there's both an interface for the SRIOV NIC itself and a "representor netdevice" (see these slides). Sometimes the names get messed up on these and you have to reboot the host or mess around with udev rules.