Test Stack Survey

= Responses = Please be patient while survey results are posted and the format regularized.


 * 0-day survey response
 * buildbot survey response
 * ci-rt survey response (uses r4d) - realtime test system used by Linutronix
 * CKI survey response - used by RedHat
 * Fuego survey response
 * Gentoo Kernel CI survey response
 * hottest notes
 * Jenkins survey response ??? (no one is specifically representing Jenkins at the summit)
 * KernelCI survey response
 * kerneltests survey response
 * Krzk Samsung-SoC survey response
 * Kselftest survey response
 * ktest survey response
 * LAVA survey response
 * Labgrid survey response
 * LKFT survey response
 * LTP survey response
 * Opentest survey response
 * Phoronix survey response
 * SLAV survey response
 * syzbot survey response (& syzkaller)
 * tbot survey response
 * TCF survey response
 * Xilinx test survey response (aka regression_xlnx)
 * Yocto project survey response

= Survey = This survey was conducted of leading Linux-related tests and test frameworks, in the Fall of 2018, in preparation for the Automated Testing Summit. Responses were received by e-mail and reformatted as wiki pages.

Diagrams
Below is a diagram for the high level CI loop:

The boxes represent different processes, hardware, or storage locations. Lines between boxes indicate APIs or control flow, and are labeled with letters. The intent of this is to facilitate discussion at the summit.



If you have an element that is not featured in this diagram, please let us know.


 * Fuego: one missing box is the 'Test runner', which executes aspects of the test off of the DUT

Cover text
Hello Test Framework developer or user,

The purpose of this survey is to try to understand how different Test Frameworks and Automated Test components in the Linux Test ecosystem work - what features they have, what terminology they use, and so forth. The reason to characterize these different pieces of software (and hardware) is to try to come up with definitions for a Test Stack, and possibly API definitions, that will allow different elements to communicate and interact. We are interested in seeing the commonalities and differences between stack elements.

This information will be used, to start, to prepare for discussions about test stack standards at the Automated Testing Summit 2018.

Please see the Glossary below for the meaning of words used in this survey. If you use different words in your framework for the same concept, please let us know. If you think there are other words that should be in the Glossary, please let us know.

Survey Questions

 * What is the name of your test framework?

Which of the aspects of the CI loop does your test framework perform?

The answers can be: "yes", "no", or "provided by the user". Where the answer is not simply yes or no, an explanation is appreciated.

For example, in Fuego, Jenkins is used for trigger detection (that is, to detect new SUT versions), but the user must install the Jenkins module and configure this themselves. So Fuego supports triggers, but does not provide them pre-configured for the user.

If the feature is provided by a named component in your system (or by an external module), please provide the name of that module.

Does your test framework:

source code access

 * access source code repositories for the software under test?
 * access source code repositories for the test software?
 * include the source for the test software?
 * provide interfaces for developers to perform code reviews?
 * detect that the software under test has a new version?
 * if so, how? (e.g. polling a repository, a git hook, scanning a mail list, etc.)
 * detect that the test software has a new version?

test definitions
Does your test system:
 * have a test definition repository?
 * if so, what data format or language is used (e.g. yaml, json, shell script)

Does your test definition include:
 * source code (or source code location)?
 * dependency information?
 * execution instructions?
 * command line variants?
 * environment variants?
 * setup instructions?
 * cleanup instructions?
 * if anything else, please describe:

Does your test system:
 * provide a set of existing tests?
 * if so, how many?

build management
Does your test system:
 * build the software under test (e.g. the kernel)?
 * build the test software?
 * build other software (such as the distro, libraries, firmware)?
 * support cross-compilation?
 * require a toolchain or build system for the SUT?
 * require a toolchain or build system for the test software?
 * come with pre-built toolchains?
 * store the build artifacts for generated software?
 * in what format is the build metadata stored (e.g. json)?
 * are the build artifacts stored as raw files or in a database?
 * if a database, what database?

Test scheduling/management
Does your test system:
 * check that dependencies are met before a test is run?
 * schedule the test for the DUT?
 * select an appropriate individual DUT based on SUT or test attributes?
 * reserve the DUT?
 * release the DUT?
 * install the software under test to the DUT?
 * install required packages before a test is run?
 * require particular bootloader on the DUT? (e.g. grub, uboot, etc.)
 * deploy the test program to the DUT?
 * prepare the test environment on the DUT?
 * start a monitor (another process to collect data) on the DUT?
 * start a monitor on external equipment?
 * initiate the test on the DUT?
 * clean up the test environment on the DUT?

DUT control
Does your test system:
 * store board configuration data?
 * in what format?
 * store external equipment configuration data?
 * in what format?
 * power cycle the DUT?
 * monitor the power usage during a run?
 * gather a kernel trace during a run?
 * claim other hardware resources or machines (other than the DUT) for use during a test?
 * reserve a board for interactive use (ie remove it from automated testing)?
 * provide a web-based control interface for the lab?
 * provide a CLI control interface for the lab?

Run artifact handling
Does your test system:
 * store run artifacts
 * in what format?
 * put the run meta-data in a database?
 * if so, which database?
 * parse the test logs for results?
 * convert data from test logs into a unified format?
 * if so, what is the format?
 * evaluate pass criteria for a test (e.g. ignored results, counts or thresholds)?
 * do you have a common set of result names: (e.g. pass, fail, skip, etc.)
 * if so, what are they?


 * How is run data collected from the DUT?
 * e.g. by pushing from the DUT, or pulling from a server?
 * How is run data collected from external equipment?
 * Is external equipment data parsed?

User interface
Does your test system:
 * have a visualization system?
 * show build artifacts to users?
 * show run artifacts to users?
 * do you have a common set of result colors?
 * if so, what are they?
 * generate reports for test runs?
 * notify users of test results by e-mail?


 * can you query (aggregate and filter) the build meta-data?
 * can you query (aggregate and filter) the run meta-data?


 * what language or data format is used for online results presentation? (e.g. HTML, Javascript, xml, etc.)
 * what language or data format is used for reports? (e.g. PDF, excel, etc.)


 * does your test system have a CLI control tool?
 * what is it called?

Languages:
Examples: json, python, yaml, C, javascript, etc.
 * what is the base language of your test framework core?

What languages or data formats is the user required to learn? (as opposed to those used internally)

Can a user do the following with your test framework:

 * manually request that a test be executed (independent of a CI trigger)?
 * see the results of recent tests?
 * set the pass criteria for a test?
 * set the threshold value for a benchmark test?
 * set the list of testcase results to ignore?
 * provide a rating for a test? (e.g. give it 4 stars out of 5)
 * customize a test?
 * alter the command line for the test program?
 * alter the environment of the test program?
 * specify to skip a testcase?
 * set a new expected value for a test?
 * edit the test program source?
 * customize the notification criteria?
 * customize the notification mechanism (eg. e-mail, text)
 * generate a custom report for a set of runs?
 * save the report parameters to generate the same report in the future?

Requirements
Does your test framework:
 * require minimum software on the DUT?
 * require minimum hardware on the DUT (e.g. memory)
 * If so, what? (e.g. POSIX shell or some other interpreter, specific libraries, command line tools, etc.)
 * require agent software on the DUT? (e.g. extra software besides production software)
 * If so, what agent?
 * is there optional agent software or libraries for the DUT?
 * require external hardware in your labs?

APIS
Does your test framework:
 * use existing APIs or data formats to interact within itself, or with 3rd-party modules?
 * have a published API for any of its sub-module interactions (any of the lines in the diagram)?
 * Please provide a link or links to the APIs?

Sorry - this is kind of open-ended... Are they:
 * What is the nature of the APIs you currently use?
 * RPCs?
 * Unix-style? (command line invocation, while grabbing sub-tool output)
 * compiled libraries?
 * interpreter modules or libraries?
 * web-based APIs?
 * something else?

Relationship to other software:

 * what major components does your test framework use (e.g. Jenkins, Mondo DB, Squad, Lava, etc.)
 * does your test framework interoperate with other test frameworks or software?
 * which ones?

Overview
Please list the major components of your test system.

Just as an example, Fuego can probably be divided into 3 main parts, with somewhat overlapping roles: There are lots details omitted, but you get the idea.
 * Jenkins - job triggers, test scheduling, visualization, notification
 * Core - test management (test build, test deploy, test execution, log retrieval)
 * Parser - log conversion to unified format, artifact storage, results analysis

Please list your major components here:
 * [component 1...]

Test lifecycle diagram
Here is a diagram with the QA lifecycle for a test.



Glossary
Here is a glossary of terms. Please indicate if your system uses different terms for these concepts. Also, please suggest any terms or concepts that are missing.

See below for candidate terms.


 * Bisection - automatic testing of SUT variations to find the source of a problem
 * Boot - to start the DUT from an off state (addition: point of time when a test can be started)
 * Build artifact - item created during build of the software under test
 * Build manager (build server) - a machine that performs builds of the software under test
 * Dependency - indicates a pre-requisite that must be filled in order for a test to run (e.g. must have root access, must have 100 meg of memory, some program must be installed, etc.)
 * Device under test (DUT) - the hardware or product being tested (consists of hardware under test and software under test) (also 'board', 'target')
 * Deploy - put the test program or SUT on the DUT
 * this one is ambiguous - some people use this to refer to SUT installation, and others to test program installation
 * Device under Test (DUT) - a product, board or device that is being tested
 * DUT controller - program and hardware for controlling a DUT (reboot, provision, etc.)
 * DUT scheduler - program for managing access to a DUT (take online/offline, make available for interactive use)
 * This is not shown in the CI Loop diagram - it could be the same as the Test Scheduler
 * Lab - a collection of resources for testing one or more DUTs (also 'board farm')
 * Log - one of the run artifacts - output from the test program or test framework
 * Log Parsing - extracting information from a log into a machine-processable format (possibly into a common format)
 * Monitor - a program or process to watch some attribute (e.g. power) while the test is running
 * This can be on or off the DUT.
 * Notification - communication based on results of test (triggered by results and including results)
 * Pass criteria - set of constraints indicating pass/fail conditions for a test
 * Provision (verb) - arrange the DUT and the lab environment (including other external hardware) for a test
 * This may include installing the SUT to the device under test and booting the DUT.
 * Report generation - collecting run data and putting it into a formatted output
 * Request (noun) - a request to execute a test
 * Result - the status indicated by a test - pass/fail (or something else) for a Run
 * Results query - Selection and filtering of data from runs, to find patterns
 * Run (noun) - an execution instance of a test (in Jenkins, a build)
 * Run artifact - item created during a run of the test program
 * Serial console - the Linux console connected over a serial connection
 * Software under test (SUT) - the software being tested
 * Test agent - software running on the DUT that assists in test operations (e.g. test deployment, execution, log gathering, debugging
 * One example would be 'adb', for Android-based systems)
 * Test definition - meta-data and software that comprise a particular test
 * Test program - a script or binary on the DUT that performs the test
 * Test scheduler - program for scheduling tests (selecting a DUT for a test, reserving it, releasing it)
 * Test software - source and/or binary that implements the test
 * Transport (noun) - the method of communicating and transferring data between the test system and the DUT
 * Trigger (noun) - an event that causes the CI loop to start
 * Variant - arguments or data that affect the execution and output of a test (e.g. test program command line; Fuego calls this a 'spec')
 * Visualization - allowing the viewing of test artifacts, in aggregated form (e.g. multiple runs plotted in a single diagram)

Candidate terms

 * Actual Value - the value that was seen for an operation performed by a test
 * Expected value - the value that was expected for an operation performed by a test
 * Feature - an attribute of a DUT or SUT or test environment that can be used to match tests. Requiring a DUT to have a particular feature could be a test dependency. (used by labgrid)
 * Device type - The name of a set of DUTs that have identical or similar features, such that any one of them can be used to run a test (used by LAVA)
 * Tim's comment: Some examples would be good. I'm not sure I like this.  Is there a related term for the set of boards that have a particular type?  (e.g. something that refers to the pool of boards, rather than the characteristics of the set?  Maybe DUT pool?)
 * PDU - Power Distribution Unit - a piece of hardware used to control power to one or more DUTs (used by LAVA)
 * Interactive DUT access - the ability to take a board out of automated testing service, for use in interactive testing or debugging sessions (or for some other reason. "DUT-offlining"?  "DUT reservation"?)
 * DUT Supervisor - provides connection to the DUT and abstraction for DUT management actions (e.g. dut_boot, dut_login, dut_exec, dut_copyfrom, dut_copyto commands - additional software on NanoPi) (used by SLAV)
 * Test Profile - same thing as Test Definition. Test Profile is used by Phoronix Test Suite.
 * Test Plan - a group of tests run together (e.g. sequentially on the same DUT), along with reporting/notification instructions (used by Fuego)

= a couple of miscelaneous notes =
 * A Linux boot test is kind of strange, in that the software under test (the Linux kernel) is also the test program (the program that performs the action).
 * Maybe in this case, the test program does not reside on the DUT.
 * Fuego tests technically are composed of a host-side script and (usually) a DUT-side test program

Rejected questions
Does your system: Can a user:
 * add a new test definition to the system? (e.g. import a new test to the framework)
 * publish a test (not run results, but the test itself)?
 * share information about testcases with other users?
 * share test meta-data with other users?
 * share test program customizations?
 * share variants with other users?
 * share pass criteria with other users?
 * share tests with other users?