Add Cloudlab docs (#250)

* docs: add CloudLab getting started guide * docs: make instructions for Ansible consistent here and there * docs: install Ansible with apt * docs: update cloudlab.md to reflect recent changes elsewhere
xlab-uiuc · Aug 4, 2023 · ffe0371 · ffe0371 · github-actions · Aug 4, 2023
1 parent 142995f
commit ffe0371
Show file tree

Hide file tree

Showing 3 changed files with 138 additions and 19 deletions.
diff --git a/docs/cloudlab.md b/docs/cloudlab.md
@@ -0,0 +1,127 @@
+# CloudLab getting started guide
+
+## 1 Prerequisites
+
+First, submit an account request to CloudLab (https://www.cloudlab.us/). Things to note:
+
+- If you're an internal member, select "Join Existing Project" and type: `Sieve-Acto`. Otherwise, you'll have to join other existing projects or create a new one, which is not detailed here.
+- The username and key you provide will be used for SSH login
+
+Wait for the admin to approve your account. Once you are able to login, familiarize yourself with the web-based dashboard, and [the concept of *profiles* and *experiments*](https://docs.cloudlab.us/basic-concepts.html).
+
+Although you should be able to log in to any machine instantiated by your project collaborators (i.e. a Linux user will be automatically created for you on every machine with `authorized_keys` set up), for us (`Sieve-Acto`), the current practice is to let everyone run code **on their own experiments**.
+
+Next you'll prepare the dependencies either manually ([section 2](#2-manually-set-up-the-dependencies)) or automatically ([section 3](#3-automatically-set-up-the-dependencies), recommended).
+
+## 2 Manually set up the dependencies
+
+<details><summary>Click to show details</summary>
+
+### 2.1 Create CloudLab experiments
+
+Launch an experiment via the web dashboard:
+
+1. Click "Experiments" -- "Start Experiment". The default selected profile should be `small-lan`. "Next".
+2. Enter/Choose parameters:
+    - "Select OS image": `UBUNTU 20.04`
+    - "Optional physical node type": `c6420`
+    - Leave other parameters as default. (Especially those regarding temporary filesystem -- this will be handled after provisioning using Ansible.)
+3. "Next". Give your experiment a name. "Next". "Finish".
+
+Wait for the provisioning to finish. The web dashboard will show you the server address, in the form of `<node>.<cluster>.cloudlab.us`. E.g. `clnode123.clemson.cloudlab.us`.
+
+### 2.2 Install the dependencies with Ansible
+
+You are going to manage CloudLab machines with Ansible from a controller node. This "controller" can be your local machine, or one of the CloudLab machines themselves.
+
+**On your controller node**:
+
+Install Ansible:
+
+```shell
+sudo apt update
+sudo apt -y install software-properties-common
+sudo add-apt-repository --yes --update ppa:ansible/ansible
+sudo apt -y install ansible
+ansible-galaxy collection install ansible.posix
+ansible-galaxy collection install community.general
+```
+
+Clone the Ansible scripts:
+
+```shell
+git clone https://github.com/xlab-uiuc/acto-cloudlab.git /tmp/acto-cloudlab
+```
+
+Set up `ansible_hosts` file (**remember to replace the placeholders with your real domain and user name**):
+
+```shell
+domain="clnodeXXX.clemson.cloudlab.us"
+user="alice"
+
+cd /tmp/acto-cloudlab/scripts/ansible/
+echo "$domain ansible_connection=ssh ansible_user=$user ansible_port=22" > ansible_hosts
+```
+
+> If the controller is a CloudLab machine too, this step can be automated:
+>
+> ```shell
+> sudo apt -y install xmlstarlet
+>
+> component_name=$( geni-get portalmanifest | xmlstarlet sel -N x="http://www.geni.net/resources/rspec/3" -t -v "//x:node/@component_id" )
+> cluster_domain=$( echo $component_name | cut -d '+' -f 2 )
+> node_subdomain=$( echo $component_name | cut -d '+' -f 4 )
+> domain="${node_subdomain}.${cluster_domain}"
+> user=$( geni-get user_urn | rev | cut -d '+' -f -1 | rev )
+>
+> cd /tmp/acto-cloudlab/scripts/ansible/
+> echo "$domain ansible_connection=ssh ansible_user=$user ansible_port=22" > ansible_hosts
+> ```
+>
+> Or even simpler, use `127.0.0.1` directly:
+>
+> ```shell
+> cd /tmp/acto-cloudlab/scripts/ansible/
+> echo 127.0.0.1 > ansible_hosts
+> ```
+
+(Only if the controller is a CloudLab machine too) work around the key authentication:
+
+```shell
+ssh-keygen -b 2048 -t rsa -f ~/.ssh/id_rsa -q -N "" && cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
+```
+
+Finally, run the Ansible scripts to install dependencies:
+
+```shell
+ansible-playbook -i ansible_hosts configure.yaml
+```
+
+</details>
+
+(Only if the controller is a CloudLab machine too) log out before jumping to section 4 and logging in again.
+
+Go to [section 4](#4-run-acto).
+
+## 3 Automatically set up the dependencies
+
+Everything needed to install the dependencies (see section 2) is included in [a CloudLab profile](https://github.com/xlab-uiuc/acto-cloudlab), by which the same environment can be set up without manually entering any command.
+
+Launch an experiment via the web dashboard:
+
+1. Open this link: https://www.cloudlab.us/p/Sieve-Acto/acto-cloudlab?refspec=refs/heads/main. The default selected profile should be `acto-cloudlab`. "Next".
+2. "Next". Give your experiment a name. "Next". "Finish".
+
+Wait for provisioning and startup both to finish (i.e. under the "List View" tab, "Status" is `ready` and "Startup" is `Finished`). The web dashboard will show you the server address, in the form of `<node>.<cluster>.cloudlab.us`. E.g. `clnode123.clemson.cloudlab.us`.
+
+## 4 Run Acto
+
+Log in to the CloudLab machine, and run:
+
+<!-- TODO this is now sosp-ae because of Ansible scripts in acto-cloudlab -->
+
+```shell
+cd ~/workdir/acto
+make
+python3 reproduce_bugs.py --bug-id rdoptwo-287
+```
diff --git a/scripts/ansible/README.md b/scripts/ansible/README.md
@@ -1,23 +1,28 @@
-# Ansible playbook to automatically configure environment for Acto to run a baremetal machine
+# Ansible playbook to automatically configure environment for Acto to run on a baremetal machine
+
 To run the script, you first need an `ansible_hosts` file. Each line in the file should contain
 a worker in your cluster. See https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html
 for details.
 
 An example:
+
 ```ini
 c220g5-110417.wisc.cloudlab.us ansible_connection=ssh ansible_user=tylergu ansible_port=22
 c220g5-110418.wisc.cloudlab.us ansible_connection=ssh ansible_user=tylergu ansible_port=22
 ```
 
-If you haven't installed `ansible playbook` on your control node, run
+If you haven't installed `ansible-playbook` on your control node, run
+
 ```sh
 pip3 install ansible
 ansible-galaxy collection install ansible.posix
 ansible-galaxy collection install community.general
-``` 
-
-Then just run 
 ```
-bash configure.sh
+
+Then just run
+
+```sh
+ansible-playbook -i ansible_hosts configure.yaml
 ```
+
 and the proper environment will be setup on the workers.
diff --git a/scripts/ansible/configure.sh b/scripts/ansible/configure.sh
File	Stmts	Miss	Cover	Missing
acto
__main__.py	86	86	0%	1–173
common.py	354	96	73%	97–98, 102–116, 120, 122, 124, 129, 132, 136, 138, 140, 142, 144, 148, 153–162, 203, 236, 249, 291, 315–316, 325, 327, 330, 334–343, 363–366, 415, 418, 432, 460–461, 498–503, 505–507, 514–517, 521, 537–538, 544–555, 614–623, 627, 634–639
deploy.py	152	121	20%	36–39, 42–58, 62, 65–75, 83–113, 116–123, 129–151, 154–167, 179–202, 209–218, 226–244
engine.py	526	526	0%	1–906
oracle_handle.py	24	13	46%	15–18, 26–27, 39–46, 54
reproduce.py	110	110	0%	1–210
serialization.py	30	16	47%	19–27, 33–39
snapshot.py	25	2	92%	17, 34
acto/checker
checker.py	13	2	85%	12, 19
checker_set.py	57	12	79%	41–42, 67–78
test_checker.py	162	152	6%	13–246
acto/checker/impl
crash.py	30	2	93%	14, 16
health.py	49	3	94%	36, 65, 81
kubectl_cli.py	29	4	86%	32–33, 44–46
operator_log.py	24	2	92%	24, 32
state.py	227	52	77%	55, 63–64, 69–93, 104–105, 133, 140, 147–149, 166, 265, 288, 317, 319–323, 330–338
state_compare.py	76	3	96%	62, 73, 80
state_condition.py	39	13	67%	18, 24, 29–37, 47, 53
acto/input
get_matched_schemas.py	54	22	59%	12, 47–51, 55–74
input.py	588	469	20%	29–36, 73, 91, 111, 122–125, 128, 158–170, 173–184, 188, 192, 199, 205, 209–388, 409–423, 431–436, 440–453, 462, 478, 481–488, 500, 502, 504–506, 509–522, 527, 535–546, 549–555, 563–590, 594–874, 897–910, 914–961
testcase.py	55	20	64%	40–50, 53, 56, 59, 62–66, 95–96, 100, 103, 109, 116, 119
testplan.py	183	137	25%	14–24, 27–29, 32, 35–42, 45, 48–67, 70–74, 78, 82, 85–91, 99–107, 110–121, 124, 127, 130, 133, 136–138, 141–151, 154–164, 167, 174–185, 188–194, 200, 203–219, 222, 225, 231, 239–244, 247, 250, 253, 259–260, 263–269, 272, 275, 278
value_with_schema.py	337	218	35%	16, 20, 24, 28, 32, 36, 40, 54, 58, 61–67, 70–76, 86–114, 117–124, 128–137, 141–149, 152–156, 159, 162, 166, 180, 183, 186–192, 195–201, 211–230, 233–240, 243, 247–256, 260–272, 275–279, 282, 285, 289, 301, 307, 312, 314, 316, 320, 324, 329–333, 337–340, 349–359, 362–369, 373–376, 380–385, 388–391, 405, 408–412, 416, 421–428, 431–434, 437–439, 442–445, 448–451, 462, 465, 468, 485–498
valuegenerator.py	620	389	37%	20, 24, 28, 32, 43, 47, 51, 55, 76–86, 90–104, 107, 110, 114, 117, 121–130, 133–139, 142, 145, 148, 151, 154, 157–167, 194–199, 202–213, 216, 219, 223, 226–231, 234–237, 240, 243–248, 251–254, 257, 260, 263, 266, 269, 273–282, 285, 288, 291, 294–304, 331–343, 346–347, 350, 353, 357, 360–365, 368–371, 374, 377–382, 385–388, 391, 394, 397, 400, 403, 406, 409–419, 450–482, 485–496, 499–502, 505–508, 512–520, 523, 526, 529, 532, 535, 538–548, 573–589, 592–609, 612, 615, 619–621, 624–628, 631–632, 635–644, 647–653, 656–657, 660–670, 673, 676, 679, 682, 685, 688–698, 710–711, 714–724, 727–730, 733–736, 740, 748, 751–752, 755–765, 768–771, 774–777, 781, 794–797, 800–814, 817, 820, 824, 827–832, 835, 838, 841–846, 849, 852, 855, 858, 861–871, 881, 884, 887, 890, 894, 913, 919, 931, 933, 936–940, 943–946, 951, 955, 957, 961–962
acto/input/known_schemas
base.py	53	13	75%	17–18, 28, 37, 46–47, 56–57, 66, 75, 84–85, 93
cronjob_schemas.py	76	33	57%	13, 16–19, 22, 25, 36–39, 42–47, 50, 53, 59, 62–65, 68, 71, 82, 85–90, 93, 96, 113, 117–119, 131, 137, 140
deployment_schemas.py	59	25	58%	16, 22–27, 30–32, 35, 38, 54–57, 65–67, 70, 78–81, 91, 94
known_schema.py	75	39	48%	28, 31–34, 37, 43, 46–48, 51, 54, 81–84, 102–113, 117–135
pod_disruption_budget_schemas.py	56	22	61%	14–17, 21, 25–27, 30, 41–44, 48, 54, 57, 68–71, 81, 84
pod_schemas.py	797	271	66%	16–19, 23, 28, 32, 40–43, 47, 51–53, 61–64, 68, 73, 83, 92, 151, 156, 160, 167, 171, 178, 182, 238, 242, 247, 251, 294, 298, 303, 307, 335, 338, 341, 344, 347, 350, 353, 356, 359, 362, 365, 368, 393, 397, 400–405, 408, 414, 417, 420, 428, 431, 434, 437, 445, 453, 481, 488–494, 503, 507, 539, 543, 572, 575, 578, 581, 584, 587, 590, 619, 623, 626–631, 634, 642, 645, 648, 664–667, 670–672, 675, 684, 690, 693–696, 699, 702, 713–714, 717–719, 722, 725, 736, 739, 742, 745–748, 752, 756–758, 761–765, 768, 774, 777, 780, 783, 786, 789, 792, 795, 798, 801, 804, 807, 810, 813, 816, 840, 845, 849–857, 860, 866, 869, 872, 875, 878, 881, 884, 887, 890, 893, 896, 899, 902, 905, 908, 929, 933–935, 938–943, 946, 959, 964, 974, 980, 986, 989, 992, 1032, 1036–1038, 1041, 1054, 1059, 1065, 1068–1071, 1074, 1077, 1088–1091, 1094–1096, 1102, 1108, 1111–1114, 1117, 1120, 1133–1136, 1139–1141, 1147, 1153, 1156–1159, 1162, 1165, 1176–1179, 1182–1184, 1190, 1196, 1199–1202, 1205, 1212, 1215–1217, 1223, 1238, 1243, 1247, 1274, 1278, 1291, 1296, 1302, 1305, 1308, 1314, 1317, 1320–1322, 1325, 1342, 1345, 1348, 1374, 1378–1381, 1384, 1402, 1408, 1411, 1414, 1421, 1427, 1469, 1473
resource_schemas.py	149	50	66%	15–19, 22, 25, 28–32, 35, 38, 45, 49, 54, 57–59, 62, 76, 78, 82, 96, 102–105, 108, 111, 121, 134, 147, 154, 159–162, 165, 173–176, 179, 187, 194, 199, 213, 231, 236
service_schemas.py	178	67	62%	13, 16, 19, 25–30, 33, 36, 42, 45–48, 51, 54, 64–67, 70–73, 79, 85, 88–91, 94, 101–102, 105–108, 111, 114, 127, 142, 180, 184, 208, 214, 217, 220, 228–231, 235, 240, 244–246, 249, 257–258, 261, 264, 277–280, 284, 288–290, 293
statefulset_schemas.py	186	61	67%	15–18, 21, 31–36, 42, 49–53, 56–58, 61–63, 66–70, 73–75, 78–80, 83, 90–93, 99–102, 105, 132, 149, 154, 158, 164, 167–170, 173, 176, 190–193, 196–201, 207, 225, 230, 234, 262, 266, 290
storage_schemas.py	179	79	56%	13, 16, 25–30, 33, 36, 42, 48–53, 59, 67–70, 74, 79–80, 83, 89, 92–95, 98, 104–105, 108–111, 114–116, 122, 130–131, 135, 140, 145–148, 154–155, 158, 164, 181–184, 193, 197, 203–205, 211, 214, 228–231, 244, 250–252, 255, 258, 266–269, 273, 277–279, 282
acto/kubectl_client
kubectl.py	23	18	22%	8–14, 23–29, 37–44
acto/kubernetes_engine
base.py	48	30	38%	12, 16, 20, 24, 28, 31–49, 56–70
k3d.py	85	85	0%	1–139
kind.py	88	71	19%	18, 22–51, 57, 66–99, 102–114, 117–130, 137–151
minikube.py	3	3	0%	1–5
acto/monkey_patch
monkey_patch.py	79	24	70%	9, 18, 36, 39, 41, 45–56, 76, 90–95, 106
acto/parse_log
parse_log.py	78	19	76%	71–74, 84–86, 89–91, 96–98, 116–124
acto/post_process
post_diff_test.py	380	223	41%	39, 53–54, 159, 169, 173, 178–187, 222, 224, 226–230, 236–253, 261–269, 272–294, 301–310, 313–360, 371–375, 395, 403–433, 436–453, 457–505, 518, 522, 528–529, 531–552, 562–598
post_process.py	104	22	79%	51, 55, 67, 77–88, 100, 103, 138–142, 146, 150, 158, 162
test_post_process.py	28	1	96%	53
acto/runner
runner.py	286	258	10%	24–44, 72–117, 120–129, 132–163, 170–197, 204–227, 230–236, 239–258, 261–266, 277–282, 296–301, 314–390, 395–399, 405–415, 426–449, 454–487
acto/schema
anyof.py	44	20	55%	26, 30–38, 41, 44, 47–49, 52, 55–60
array.py	70	29	59%	31, 43–52, 57, 59–62, 71–78, 81–83, 86–88, 91, 94, 97, 103
base.py	102	57	44%	13–15, 18–20, 23, 26–37, 40, 43, 46–48, 51–61, 64–74, 77, 80–86, 95, 100, 105, 110, 114, 118, 142, 145–149
boolean.py	27	10	63%	16, 20, 23, 26, 29–32, 35, 38
integer.py	26	9	65%	17, 21–23, 26, 29, 32, 35, 38
number.py	30	11	63%	30–32, 35–37, 40, 43, 46, 49, 52
object.py	117	48	59%	44, 46, 51, 66–75, 80, 83, 94–109, 112–120, 123–126, 129, 132, 148, 151–158, 168
oneof.py	44	30	32%	13–19, 22, 25–27, 30–38, 41, 44, 47–49, 52, 55–60
opaque.py	17	6	65%	13, 16, 19, 22, 25, 28
schema.py	42	8	81%	21, 25, 31–34, 49–50
string.py	27	9	67%	25, 29–31, 34, 37, 40, 43, 46
acto/utils
__init__.py	14	1	93%	12
acto_timer.py	31	22	29%	10–15, 19, 22–33, 38–40, 44–47
config.py	34	1	97%	63
error_handler.py	43	33	23%	13–30, 38–52, 56–74
k8s_helper.py	64	51	20%	21–27, 39–45, 57–61, 73–79, 83–91, 95–102, 106–127
preprocess.py	69	59	14%	16–62, 74–123, 129–174
process_with_except.py	9	9	0%	1–13
thread_logger.py	15	3	80%	9, 18, 28
TOTAL	8010	4300	46%