Board Management Layer Notes

Here are notes about the different ways that system do board managemnent:

= different systems =

SLAV
Uses the following verbs or functions:
 * dut_boot
 * dut_login
 * dut_exec
 * dut_copyfrom
 * dut_copyto
 * fota = "flash-over-the-air" for flashing a board

Questions:
 * what layer provides these?
 * what layer calls these?

ttc (Tiny Target Control)
Uses the following verbs or functions:
 * ttc kinstall - install kernel (either to target or tftp area on server)
 * ttc fsinstall - install root filesystem (either to target or nfs area on server)
 * ttc reboot - reboot the target
 * ttc console - get access to console on the target
 * ttc login - get to login shell on the target
 * ttc run - execute a command on the target
 * ttc cp - copy files to or from the target
 * example: ttc [ ] cp file target:/tmp
 * example: ttc [ ] cp target:/some/dir/file hostfile
 * ttc rm - remove files from target
 * ttc waitfor - wait for a command to complete successfully

reservations

 * ttc status - show status of a board (including reservation)
 * ttc reserve - reserve a target
 * ttc release - release a target

labgrid

 * Resource - Simple Resources like USB Serial Ports, Power Switch Ports (with availability annotations in the remote infrastructure)
 * Driver - Drivers which bind to a target and use resources, i.e. SerialDriver to use a SerialPort
 * Protocol - Abstract description of a driver interface, i.e. ConsoleProtocol for a driver which provides `read`, `write` `sendline`, `sendcontrol` and `expect` functions
 * Place - Collection of Resources to describe a board in the remote infrastructure
 * Target - description of a device under test with resources and drivers

Remote infrastructure verbs:
 * lock/acquire - acquire exclusive access
 * unlock/release - release exclusive access
 * add-match/del-match - add or remove regex matches to exported resources
 * create/delete - create or delete a place

r4d
see https://github.com/ci-rt/r4d for an overview.

R4d focuses on a rack-based lab, with power controllers and serial device servers that are kept in sync with each other.

"r4d means Remote For Device-under-test and is an infrastructure for power-control and console access for multiple Linux Boards that should be controlled by a test-infrastructure like jenkins."

Functions for lab management and rack/slot/controller assignment:
 * r4dcfg --add-rack  - add a rack to the lab
 * r4dcfg --add-power  - add a power switch to the lab
 * r4dcfg --add-serial  - add a serial device server to the lab
 * r4dcfg --add-board  - add a board to the lab
 * the board should be connected to the same port (indicated) on the power controller and serial device server (e.g., if on port 5, the board must be configured on port 5 of the power controller and port 5 of the serial device server)
 * r4dcfg --move-board - move a board to a different rack and/or port
 * r4dcfg --delete-board - remove a board from the lab


 * r4dcfg --show-db - inspect lab configuration
 * r4dcfg --list-boards - show boards and board configuration


 * r4dcfg --poweron
 * r4dcfg --poweroff
 * r4dcfg --powercycle

The following power control modules are supported:
 * net8x (Gude Expert Power Control NET 8x)
 * pc8210 (Gude Expert Power Control 8210 / 8211)

The following serial device servers are supported:
 * ps810 (Sena Pro Series PS810)

Main access to and control of the boards is provided by a libvirt API.

The virsh command set is documented here: https://libvirt.org/sources/virshcmdref/html-single/

Fuego
These are considered the "transport" APIs:
 * cmd - execute a command on the device under test
 * report - execute a command, and log its output (used to execute the actual test program)
 * put - copy files and/or directories to the device under test
 * get - copy files and/or directories from the device under test

And a board management API:
 * reboot - reboot the device under test

details
The APIs provided by the plugin-class for this are:
 * ov_transport_connect - establish communication channel with a board
 * ov_transport_disconnect - disconnect communication channel with a board
 * ov_transport_get - get files from a board
 * ov_transport_put - put files to a board
 * ov_transport_cmd - execute a command on the board
 * ov_board_setup - provision and reserve a board or instantiate a vm
 * ov_board_teardown - destroy a vm instance, or release a board
 * ov_board_control_reboot - reboot a board

Currently in Fuego, The setup, teardown, connect and disconnect functions are often empty for a board, and provisioning is left as an exercise for a different element of the CI loop.

syzbot
Uses these functions:
 * Copy - copy file from host into VM
 * Forward - sets up forwarding (communications channel) from VM to host
 * Run - execute a command in the VM
 * Diagnose - returns diagnostic or debugging information from the VM
 * Close - stops and destroys the VM

Here is the interface: https://github.com/google/syzkaller/blob/28ac6e6496673327d3319bab81c57a0f7366fb45/vm/vmimpl/vmimpl.go#L32-L57

Some comments:
 * when Instance is created, it's supposed to be a "good" state (e.g. rebooted)
 * Close ("destructor") should take care of doing tear down, returning back to pool, etc
 * for Copy operation we don't specify destination path, it's supposed to be chosen by the impl (different machines can have writable storage at different paths); this is fine for our use case of copying a few files into a single dir; a more flexible interface would allow choosing a suffix of the path on the target machine (impl will choose a path prefix, but you can still re-create a particular dir layout on the target)
 * port forwarding may look a bit specialized for our use case, but we want to connect back to the host over tcp (for a richer rpc protocol); maybe we could limit this to just 1 port which is specified during construction (not dynamically, e.g. qemu port forwarding can be setup only when you start the instance, not later)
 * the Run method is designed for long-running processes, so it streams output and can be aborted; also the console output is included in the command output, which may not be the best decision (it's always possible to merge it later, but not possible to unmerge)
 * Diagnose is newer addition and can be used to provoke machine/OS-dependent diagnostics output on the console (because we do care a lot about kernel crashes/hangs and ability to understand what happened later based on console output)