ATS 2018 Minutes

These are some minutes from the Automated Test Summit, 2018

The meeting was held on Thursday, October 25 in Edinburgh, Scotland.

= Sponsors = The sponsors for the meeting were: * the Core Embedded Linux Project (CELP) of the Linux Foundation * Linaro * DENSO TEN

= Attendees = The following people attended the meeting:

Name                Company              Project -   - Alejandro Hernandez  TI                   Opentest Alice Ferrazzi      Gentoo               Gentoo kernel CI Anders Roxell        Linaro               kselftest Andrew Murray       ARM                  Witekio board farm Carlos Hernandez    TI                   Opentest Chris Fiege         Pengutronix          Labgrid, sdmux Cyril Hrubis        SUSE                 LTP Dan Rue             Linaro               Kernel Validation Daniel Sangorrin    Toshiba              Fuego, CIP Dmitry Vyukov       Google               syzbot Geert Uytterhoeven  Renesas              LTSI maintainer Guenter Roeck       Google               kerneltests.org Heiko Schocher      Denx                 tbot Hirotaka Motai      Mitsubishi Electric  Fuego RT testing Jan Lübbe           Pengutronix          Labgrid Jan-Simon Moller    Linux Foundation     AGL Kevin Hilman        BayLibre             KernelCI Khiem Nguyen        Renesas              Fuego, LTSI testing Li Xiaoming         Fujitsu              Fuego Manuel Traut        Linutronix           ci-RT, r4d Mark Brown          Linaro               Kernelci Matt Hart           Linaro               LAVA Michal Simek        Xilinx               Xilinx testing Milosz Wasilewski   Linaro               LKFT Nobuhiro Iwamatsu   Cybertrust Japan     Gentoo kernel CI Pawel Wieczorek      Samsung              SLAV Philip Li           Intel                0-day Punnaiah Kalluri    Xilinx               Xilinx testing Richard Purdie      Linux Foundation     Yocto Project Sjoerd Simons       Collabora            Kernelci Steve Rostedt       VmWare               ktest, RT maintainer Tim Bird            Sony                 Fuego Tim Orling          Intel                KCF, Yocto Project Takao Koguchi       Hitachi              CELP Tsugikazu Shibata   NEC                  LTSI, CELP Yoshitake Kobayashi Toshiba              CIP, CELP Yuichi Kusakabe     DENSO TEN            Fuego, Sponsor

= outline = * welcome and introductions * vision and problem definition * glossary and diagram discussion * test definition, build artifacts, execution API (E) * run artifacts, results formats, parsers * farm standards - DUT control drivers, board definitions * wrap-up

= Minutes = Minute conventions: hash (#) prefixes speaker name (when known).

- Introductions - attendees introduced their name, company and project

- Not enough time today - Goal: Get a description framework/terminology - talk about common vocabulary - value of collaboration/difficulty of going alone ("ribbon") - we share a lot of code, but we do not share QA-tools 90% of the code is OSS, but only 5% of testing code is OSS - Jenkins, LTP are a good start, but not enough - Vision => promote the sharing of automated CI components as code is shared today - Some shift to sharing components - KernelCI uses lava - Fugeo doesn't have a board control layer - Non goals for today: - no standards or common APIs - do not implement all nice features we hear about today in your test framework => rather find a way to share
 * 1) Tim:

Tim does not want for fuego: - email based ci triggers - SUT deployment abstractions - DUT control drivers - centraliszeid results repos - distributed results visualization focus on: - repo of test definitions - sharing of pass criteria, testcase docs - generalized output parsing system

current problems: - many aspects are not shared, but nobody can do it all - tests are treated as "secret sauce" - similar to how embedded SW was viewed 20 years ago - We have no place to share our tests - LTP? kselftest? - there are other, standalone, open source tests: - cyclictest, syzkaller, iozone, lmbench - often tests are lab- or target-specific - makes HW testing difficult - Tim said sony has internal USB multiplexing test rig - tests depend on test framework - TVs are switching to android, and some use android test framework - file formats, APIs, architecture differ

- paradox of generalization and specialization => tests are hand to reuse - need the ability to customize tests - (skip lists, expected values, variants) and pass criteria - benchmark value thresholds based on previous results

- different frameworks factor their data and services quite differently - interfaces between services, central server - where are operations performed? - central server, local host, on-DUT, by the user

can now output script which can be run interactively
 * ?: lava had complaints that tests could only be run in lava, so they

- tests defintion are split up into diffent files: - customizeable per test - per board - per lab (doesn't exist in fuego)

Tim's idea: fractal nature of testing - individual tests, test suite, test plan: actual vs. expected results - can do pass criteria, result analysis, reporting at each level - the features are experessed differently at different levels

- another item missing from problem statements is modularization - frameworks tend to be monolithic systems that are hard to  extract useful sub-systems from - use of docker is incompatible with yocto (which wants to run on different distros) - yocto doesn't have a good results parser - maybe we can't directly use code, but we could start with standards for interchange format?
 * 1) Richard:

- many people use python, rest is diverse (go, perl, ruby, java, groovy) - python = about 10 frameworks use it    - go = 3 - perl = 1 - ruby = 3 - bash = 5 - java = ones that use jenkins (about 4?) - groovy - 1 used outside of jenkins
 * 1) Tim: query about languages used

- they built an abstraction on top of other tools - aggregate tests from different levels: applications, linux, low-level - below that: "execution layer": fuego, lava, ... - abstraction is Java
 * TI:

- interchange options: TAP, JUNIT, XUNIT - XML doesn't have good human readability - XUNIT (and JUNIT?) don't show all testcases, only the failures
 * 1) Tim:
 * 1) Jan? - neither does JSON, unless pretty-printed

- yocto doesn't want to run a board farm - want to run a test and collect results
 * 1) Richard:

- wants to have a central local of tests to run on the linux kernel - wants to share technology / devices used to run tests
 * 1) Michal?:

glossary review
clarification and definition of glossary terms

https://elinux.org/Test_Stack_Survey#Glossary Tim presents his view

Boot: startup phase up ntil a test can be run - (but can still be in the bootloader or whatever is to be tested) - boot failure is still a relevant result Deploy vs. provision: - Tim: deploy: install SW under test - Fuego uses "deploy" to refer to installing test program - Tim: provision: install the more general test environment -> labgrid does only to provision: setup hw around and sw on the target labgrid does usually not deploy a test program DUT: - sometimes there is a device under test - sometimes there can be multiple devices unter test - Pools - how to describe identical boards used interchangably?

- we only test software, we do not have hardware - the software is under test, and the virtual machine, or       hardware, is just another resource that is required for the test - DUT is a term of art in embedded, and we took it for granted - maybe need to find a more general term - "board"? - but that's confusing for testers using a VM       - "target"? - "system under test" - you can't test the software without it running on something
 * 1) Dmitry?:
 * 1) Kevin:
 * 1) Tim: "system under test" has the same acronym as software under test (SUT)


 * 1) Cyril: survey took a long time because of unfamiliarity with the terms

test agent: everything that gathers information on the DUT (e.g. syslogd, ssh for access, adb, ...) notification: - can go 2 places: 1) lab technician, for lab failure        2) test initiator, for problem report
 * 1) Manuel:

logs vs. run artifacts: - log is usually text output from some element - kernel, system log, test program, maybe tracer - not every run artifact is a log - examples: audio file, video file, binary traces, binary dumps - also, can be text file that is not a log, like run meta-data

trigger: - is thing that starts a test - can be explicit, like manual user action or git commit hook - can be implicit, like sending an e-mail to a kernel mailing list

test plan: - is a list of tests to trigger at the same time - most systems have this, some call it test plan - Google uses directory structure to trigger a set of tests - there was discussion about tests always being with the source - for product testing, this isn't feasible, which source would you associate a system integration test with?

- have to skip the slide for "candidate terms" - skipped term details
 * 1) Tim:

diagram review
- missing test runner (or is another bullet in test scheduler) - maybe put as another bullet in "Test Scheduler" box
 * 1) Tim: test-runner (suggested new box)
 * 1) Kevin:

- cli tools are everywhere, not just at frontend
 * 1) Michal?: (cli box)

- missing APIS inside the DUT control host box - there were too many of them - need to have separate diagram just for DUT controller box
 * 1) Tim: (DUT control APIs)
 * 1) Kevin:

- is DUT controller software or hardware? - unclear from diagram

- front ends: - there can be multiple front ends

- this seems out of place - should replace code review box with trigger
 * 1) Jan?: (code review box)

11:00 Pause = 11:12 Test Definition (TD) =

Test-definition
- storage format(s) - repository API - Elements - Issues with this: - what fields do people have? Why? - can we interoperate?

- fields - dependencies - lots of different kinds - maybe separate dependencies by how handled?: - exclude test - install something (install package) - change status (eg sudo root) - some things can't be changed (amount of memory?, number of CPUs, kconfig) - both build-time and run-time dependencies exist

Fuego: - meta-data: Maintainer, Version, license - dependencies: What features on a DUT are required - instructions: shell commands - How to visualize - Tests can be a single test, or a test can be a test suite - source, or location of source
 * 1) Tim: Here are Fuego test definition elements:

Google filters tests by source code path (net/ip/tcp/...)

Tim: - where to get the tests? (git/tarball)

Richard: for yocto you need: - install tests remotely - parsable output - simple dependencies -> ptest

type of dependencies: memory, packages, root, hardware, kernel config, files, features, permissions, lab-hardware # it would be nice to have a standard place to find the kernel config (to check which features are available)

Tim: Wants to define a "Test Execution API" (aka. "famous interface 'E'")

build artifacts
- YP has ptest, which is a package that gets delivered to target - would be nice to have a standard for "make test" - "make install" used to have all kinds of problems, but it's better now
 * 1) Richard:

- Fuego just added a prototype feature for bundling the test program - what is needed beyond just a manifest? - standard location on DUT for test materials? Some DUTs have read-only filesystems - (you can't have a single standard location)
 * 1) Tim:

- test packages need to be relocatable - how to handle files outside the package's test location? - e.g. modification to /etc or some other system path
 * 1) Tim:

- YP allows to bundle a script with test package to modify other areas of filesystem
 * 1) Richard:

- what format are people using for build artifacts (test packages)? - answers: tarball (fuego), cpio (0-day)
 * 1) Tim:

- YP can package in any of it's supported formats: - debian, ipkg, rpm
 * 1) Richard:


 * During Lunch Break: CELP Brainstorming session:

2:00 Back to ATS Summit

Run Artifacts
Tim O?: - How do we get all the data out of the DUT without interference? - We would like to have really eveything we can get - measurement is also an interference workload, so they piped it     out via the network (to avoid local storage)

- dmesg - Power consumption

- Data that shows infrastructure failures - Include the test definition and other metadata (version of testsuite, ...) with the results
 * 1) ? (opentest):

- we need all the data (self documented!) to reproduce a test - downside to using lava features like overlay: it's harder to    reproduce manually - Testlink has relational tables to link test cases to executions to performance metrics - They have upward and downward translators for each execution engine
 * 1) Richard:

- we do not only need a command format for the result but also for the test and metadata. Otherwise we can not make sense out of the data.
 * 1) Jan:

- they are thinking about a more formal test description
 * 1) Cyril:

- ptest is just pass/fail
 * 1) Richard:

https://en.wikipedia.org/wiki/TestLink - Would like to have someting above the test executions frameworks - Would improve collaboration - They have a Django App in opentest to link requirements from jira with a set of individual testcases - result tables are generated from that - use kibana to provide an overview of test racks (28+)
 * 1) Carlos: shows TI Testlink:

Result Analysis / Pass Criteria: - the result may be a value instead of pass/fail, may need to be board specific - they use pytest. he does not like the idea of having the pass-criteria in the test-case - you can add metadata to a result
 * 1) Tim O:
 * 1) Jan:

- They have a mail report which includes performance graphs. That makes it    easy to find problems. - They compare the results to the previous runs - They determine threshold automatically - They try to do automatic bisects
 * 1) Richard:
 * 1) Philip (0-day):

How do you identify false-failiures: - 0-day does automatic bisect. If they find the regression: fine; If they don't: they drop the false failiure.
 * 1) Philip:

- They use a running average + stddev to compare the new result with history
 * 1) TI Opentest:

- For aggregation on the test suite level, we need to document what the expected or allowed failures are.
 * 1) Kevin:

= Board farm standards =

Would be great if the would have a standard for APIs for board farm hardware manufacturer
 * 1) Tim:

Just go 1:1 for corporate setups, to avoid the cable problem
 * 1) Xilinx:

Cables are not the problem, they can work fine. You need to have an  abstraction layer
 * 1) Chris:

Everyone needs to find out which HW works, which doesn't
 * 1) Mark:

Would be nice to have a DUT Controller available on seed studio
 * 1) Tim Bird:

Network-Logging-Recv needs to be supported, minimal impact on the DUT is important
 * 1) Jan-Simon:

- labgrid - what does it do for plugins? - has a python API for modules - example: there's a module for pdudaemon - example: there's a module for web power switch (by Power Solutions, Inc.)

- observation: there are lots of different control points for the lab and DUT.

- suggest power control, as it's a single bit - actually, it can have voltage, bounce, etc - it's more complicated
 * 1) Tim: can we start with a single API, and expand from there

- decision(!!) to standardize on pdudaemon for power control - put power control drivers in that project

OpenTest has a generic interface for Multimeters
 * 1) TI OpenTest:

- iiodaemon might be good for measurement API (measurement drivers for   things like power) - also have gpiod
 * 1) Geert:

We should document "Design for Test" best practices
 * 1) Tim Orling:

kernelci will become a LF project (more compute, more storage)
 * 1) Kevin:

has been talking with Kevin about "os-ci"
 * 1) Tim Orling:

?:   decoupling their tests from lava were a lot of work

Hard to maintain SW/test for HW you don't have
 * 1) Tim:

This is a known problem for kernel maintainers ;)
 * 1) Mark:

Wrap-up and overview
Everyone should learn other people's systems

Would be great if every project would have a document: "how to 'hello world'" on a beagle-bone".  - hello test, but with description of execution flow through framework
 * 1) Michal:

survey is a start at sharing information.
 * 1) Tim:

= future coordination = - mailing list

- how about a dedicated list, not yocto-project based? - how about something on vger? - lists don't handle html - it may not be archived - it's not specific to the kernel - most attendees seemed to think it's OK to keep the current list - there needs to be more activity on the list - it looks dead - some of us only doing this part time - everyone try to generate more traffic

Try more pin-pointed discussions on the list rather than large mails
 * 1) Tim:

- discussion on whether to keep the mailing list - decision is to keep on YP list for now

- documents - where to put documents? - decision to put all stuff on elinux wiki for now

Maybe create the "Design for Test" document first
 * 1) Jan:

Create a list of test cases on the elinux wiki
 * 1) Michal:

- meetings - when to meet again? - what about at plumbers? - don't need sponsor - has a problem that it sells out too quick - not as many people visit it     - decision to do: ELCE 2019, Lyon France - testing track, testing BOF - private meeting again? - not sure - would need to get sponsorship again - may be            cheaper if only a meeting room

Do we have a way to speak with one voice? => one document endorsed by kernelci and yocto?
 * 1) Richard:

- do we need an organization? - we are here as individuals, not speaking for our companies? - can kernelci project be the "voice"

- Kevin can't speak for board or for kernelci project - it's not formed yet
 * 1) Kevin:

- first projects: - test definition - Tim will do another survey - not questions, but just ask for a link to an example of a test - common run artifacts - common results format - just use xunit? - who is doing this? - test execution API (E) - maybe start with list of phases - stuff on wiki: - main Automated Testing page - list of links to test repositories - get a URL for each test system

A picture was taken of summit attendees

= Decisions from the summit = - please put your DUT power control driver in pdudaemon
 * pdudaemon will serve as our first DUT control API consolidation point
 * we will continue to use the current mailing list for discussions, for now
 * we will save information and documents on the elinux wiki
 * our next physical meeting(s) will be at ELCE 2019

= Action Items from the summit = AI - refine glossary over time to remove ambiguity (??) AI - modify diagram with discussed changes (Kevin?) AI - create elinux wiki page for Automated Testing topic (Tim) AI - create elinux wiki page for Test Systems (Tim) - with links to repositories for each system AI - collect Run Artifact fields (for possible RA standard) (??) AI - collect Test Definition fields (for possible TD standard) (Tim) AI - send a list of test phases to the list (start of API 'E' discussions) (Tim) AI - create Debian package for pdudaemon (Tim Orling) AI - create an automated test project in the Linux Foundation (Kevin) - currently called KernelCI project AI - arrange for sessions and meetings at ELCE 2019 (Tim)