ATS 2018 Minutes

Jump to: navigation, search
These are some minutes from the Automated Test Summit, 2018

The meeting was held on Thursday, October 25 in Edinburgh, Scotland.

= Sponsors = 
The sponsors for the meeting were:
 * the Core Embedded Linux Project (CELP) of the Linux Foundation
 * Linaro

= Attendees =
The following people attended the meeting:

Name                 Company              Project
-------------------- -----------------    ---------------------
Alejandro Hernandez  TI                   Opentest
Alice Ferrazzi       Gentoo               Gentoo kernel CI
Anders Roxell        Linaro               kselftest
Andrew Murray        ARM                  Witekio board farm
Carlos Hernandez     TI                   Opentest
Chris Fiege          Pengutronix          Labgrid, sdmux
Cyril Hrubis         SUSE                 LTP
Dan Rue              Linaro               Kernel Validation
Daniel Sangorrin     Toshiba              Fuego, CIP
Dmitry Vyukov        Google               syzbot
Geert Uytterhoeven   Renesas              LTSI maintainer
Guenter Roeck        Google     
Heiko Schocher       Denx                 tbot
Hirotaka Motai       Mitsubishi Electric  Fuego RT testing
Jan Lübbe            Pengutronix          Labgrid
Jan-Simon Moller     Linux Foundation     AGL
Kevin Hilman         BayLibre             KernelCI
Khiem Nguyen         Renesas              Fuego, LTSI testing
Li Xiaoming          Fujitsu              Fuego
Manuel Traut         Linutronix           ci-RT, r4d
Mark Brown           Linaro               Kernelci
Matt Hart            Linaro               LAVA
Michal Simek         Xilinx               Xilinx testing
Milosz Wasilewski    Linaro               LKFT
Nobuhiro Iwamatsu    Cybertrust Japan     Gentoo kernel CI
Pawel Wieczorek      Samsung              SLAV
Philip Li            Intel                0-day
Punnaiah Kalluri     Xilinx               Xilinx testing
Richard Purdie       Linux Foundation     Yocto Project
Sjoerd Simons        Collabora            Kernelci
Steve Rostedt        VmWare               ktest, RT maintainer
Tim Bird             Sony                 Fuego
Tim Orling           Intel                KCF, Yocto Project
Takao Koguchi        Hitachi              CELP
Tsugikazu Shibata    NEC                  LTSI, CELP
Yoshitake Kobayashi  Toshiba              CIP, CELP
Yuichi Kusakabe      DENSO TEN            Fuego, Sponsor

= outline =
 * welcome and introductions
 * vision and problem definition
 * glossary and diagram discussion
 * test definition, build artifacts, execution API (E)
 * run artifacts, results formats, parsers
 * farm standards - DUT control drivers, board definitions
 * wrap-up

= Minutes = 
Minute conventions:
hash (#) prefixes speaker name (when known).

 - Introductions - attendees introduced their name, company and project

# Tim:
 - Not enough time today
 - Goal: Get a description framework/terminology
   - talk about common vocabulary
 - value of collaboration/difficulty of going alone ("ribbon")
 - we share a lot of code, but we do not share QA-tools
    90% of the code is OSS, but only 5% of testing code is OSS
   - Jenkins, LTP are a good start, but not enough
 - Vision => promote the sharing of automated CI components as code is shared today
 - Some shift to sharing components
   - KernelCI uses lava
   - Fuego doesn't have a board control layer
 - Non goals for today:
   - no standards or common APIs
   - do not implement all nice features we hear about today in your test framework
      => rather find a way to share

Tim does not want for fuego:
 - email based ci triggers
 - SUT deployment abstractions
 - DUT control drivers
 - centralized results repos
 - distributed results visualization
focus on:
 - repo of test definitions
 - sharing of pass criteria, testcase docs
 - generalized output parsing system

current problems:
 - many aspects are not shared, but nobody can do it all
   - tests are treated as "secret sauce"
   - similar to how embedded SW was viewed 20 years ago
 - We have no place to share our tests
   - LTP? kselftest?
   - there are other, standalone, open source tests:
     - cyclictest, syzkaller, iozone, lmbench
 - often tests are lab- or target-specific
   - makes HW testing difficult
   - Tim said sony has internal USB multiplexing test rig
 - tests depend on test framework
   - TVs are switching to android, and some use android test framework
   - file formats, APIs, architecture differ

 - paradox of generalization and specialization
   => tests are hard to reuse
   - need the ability to customize tests
     - (skip lists, expected values, variants) and pass criteria
   - benchmark value thresholds based on previous results

 - different frameworks factor their data and services quite differently
   - interfaces between services, central server
   - where are operations performed?
     - central server, local host, on-DUT, by the user

#?: lava had complaints that tests could only be run in lava, so they
   can now output script which can be run interactively

 - tests defintion are split up into diffent files:
   - customizable per test
   - per board
   - per lab (doesn't exist in fuego)

Tim's idea: fractal nature of testing
 - individual tests, test suite, test plan: actual vs. expected results
 - can do pass criteria, result analysis, reporting at each level
 - the features are expressed differently at different levels

# Richard:
 - another item missing from problem statements is modularization
   - frameworks tend to be monolithic systems that are hard to
   extract useful sub-systems from
 - use of docker is incompatible with yocto (which wants to run on different distros)
 - yocto doesn't have a good results parser
 - maybe we can't directly use code, but we could start with
   standards for interchange format?

# Tim: query about languages used
   - many people use python, rest is diverse (go, perl, ruby, java, groovy)
     - python = about 10 frameworks use it
     - go = 3
     - perl = 1
     - ruby = 3
     - bash = 5
     - java = ones that use jenkins (about 4?)
     - groovy - 1 used outside of jenkins

# TI:
 - they built an abstraction on top of other tools
 - aggregate tests from different levels: applications, linux, low-level
 - below that: "execution layer": fuego, lava, ...
 - abstraction is Java

# Tim:
 - interchange options: TAP, JUNIT, XUNIT
   - XML doesn't have good human readability
   - XUNIT (and JUNIT?) don't show all testcases, only the failures
# Jan? - neither does JSON, unless pretty-printed

# Richard:
 - yocto doesn't want to run a board farm
   - want to run a test and collect results

# Michal?:
 - wants to have a central local of tests to run on the linux kernel
 -  wants to share technology / devices used to run tests

== glossary review ==
clarification and definition of glossary terms
    Tim presents his view

	Boot: startup phase up until a test can be run
     - (but can still be in the bootloader or whatever is to be tested)
     - boot failure is still a relevant result
    Deploy vs. provision:
       - Tim: deploy: install SW under test
         - Fuego uses "deploy" to refer to installing test program
       - Tim: provision: install the more general test environment
       -> labgrid does only to provision: setup hw around and sw on the target
          labgrid does usually not deploy a test program
      - sometimes there is a device under test
      - sometimes there can be multiple devices under test
      - Pools - how to describe identical boards used interchangeably?

# Dmitry?:
      - we only test software, we do not have hardware
        - the software is under test, and the virtual machine, or
        hardware, is just another resource that is required for
        the test
# Kevin:
      - DUT is a term of art in embedded, and we took it for granted
      - maybe need to find a more general term
        - "board"? - but that's confusing for testers using a VM
        - "target"?
        - "system under test"
# Tim: "system under test" has the same acronym as software under test (SUT)
      - you can't test the software without it running on something

# Cyril: survey took a long time because of unfamiliarity with the terms

	test agent:
		everything that gathers information on the DUT (e.g. syslogd, ssh for access, adb, ...)
# Manuel:
      - can go 2 places:
         1) lab technician, for lab failure
         2) test initiator, for problem report

    logs vs. run artifacts:
      - log is usually text output from some element
        - kernel, system log, test program, maybe tracer
      - not every run artifact is a log
        - examples: audio file, video file, binary traces, binary dumps
        - also, can be text file that is not a log, like run meta-data

      - is thing that starts a test
        - can be explicit, like manual user action or git commit hook
        - can be implicit, like sending an e-mail to a kernel mailing list

   test plan:
     - is a list of tests to trigger at the same time
     - most systems have this, some call it test plan
     - Google uses directory structure to trigger a set of tests
       - there was discussion about tests always being with the source
       - for product testing, this isn't feasible, which source would
          you associate a system integration test with?

# Tim:
  - have to skip the slide for "candidate terms"
  - skipped term details

== diagram review ==
# Tim: test-runner (suggested new box)
 - missing test runner (or is another bullet in test scheduler)
# Kevin:
   - maybe put as another bullet in "Test Scheduler" box

# Michal?: (cli box)
 - cli tools are everywhere, not just at frontend

# Tim: (DUT control APIs)
   - missing APIS inside the DUT control host box
# Kevin:
     - there were too many of them
  - need to have separate diagram just for DUT controller box

 - is DUT controller software or hardware?
   - unclear from diagram

 - front ends:
    - there can be multiple front ends

# Jan?: (code review box)
 - this seems out of place
   - should replace code review box with trigger

11:00 Pause
= 11:12 Test Definition (TD) =

== Test-definition ==
- storage format(s)
- repository API
- Elements
- Issues with this:
    - what fields do people have? Why?
    - can we interoperate?

 - fields
 - dependencies - lots of different kinds
    - maybe separate dependencies by how handled?:
       - exclude test
       - install something (install package)
       - change status (eg sudo root)
    - some things can't be changed (amount of memory?, number of CPUs, kconfig)
    - both build-time and run-time dependencies exist

# Tim: Here are Fuego test definition elements:
    - meta-data: Maintainer, Version, license
    - dependencies: What features on a DUT are required
    - instructions: shell commands
    - How to visualize
    - Tests can be a single test, or a test can be a test suite
    - source, or location of source

    Google filters tests by source code path (net/ip/tcp/...)

    - where to get the tests? (git/tarball)

    for yocto you need:
      - install tests remotely
      - parsable output
      - simple dependencies
    -> ptest

type of dependencies:
    memory, packages, root, hardware, kernel config, files, features, permissions, lab-hardware
    # it would be nice to have a standard place to find the kernel config (to check which features are available)

Tim: Wants to define a "Test Execution API" (aka. "famous interface 'E'")

== build artifacts ==

# Richard:
  - YP has ptest, which is a package that gets delivered to target
  - would be nice to have a standard for "make test"
    - "make install" used to have all kinds of problems, but it's better now

# Tim:
  - Fuego just added a prototype feature for bundling the test program
    - what is needed beyond just a manifest?
      - standard location on DUT for test materials?
# ?:
   Some DUTs have read-only filesystems
    - (you can't have a single standard location)

# Tim:
   - test packages need to be relocatable
   - how to handle files outside the package's test location?
     - e.g. modification to /etc or some other system path

# Richard:
   - YP allows to bundle a script with test package to modify other
   areas of filesystem

# Tim:
   - what format are people using for build artifacts (test packages)?
     - answers: tarball (fuego), cpio (0-day)

# Richard:
   - YP can package in any of it's supported formats:
     - debian, ipkg, rpm

*During Lunch Break: CELP Brainstorming session:

2:00 Back to ATS Summit

== Run Artifacts ==
Tim O?:
  - How do we get all the data out of the DUT without interference?
  - We would like to have really everything we can get
  - measurement is also an interference workload, so they piped it
      out via the network (to avoid local storage)

    - dmesg
    - Power consumption

#? (opentest):
    - Data that shows infrastructure failures
    - Include the test definition and other metadata (version of testsuite, ...) with the results

# Richard:
    - we need all the data (self documented!) to reproduce a test
# ?:
    - downside to using lava features like overlay: it's harder to
     reproduce manually
# ?:
    - Testlink has relational tables to link test cases to executions
      to performance metrics
    - They have upward and downward translators for each execution engine

# Jan:
    - we do not only need a command format for the result but also for the
     test and metadata. Otherwise we can not make sense out of the data.

# Cyril:
    - they are thinking about a more formal test description

# Richard:
    - ptest is just pass/fail

# Carlos: shows TI Testlink:
    - Would like to have someting above the test executions frameworks
    - Would improve collaboration
    - They have a Django App in opentest to link requirements from jira with a set of individual testcases
      - result tables are generated from that
    - use kibana to provide an overview of test racks (28+)

Result Analysis / Pass Criteria:
  - the result may be a value instead of pass/fail, may need to be board specific
# Tim O:
    - they use pytest. he does not like the idea of having the pass-criteria in the test-case
# Jan:
    - you can add metadata to a result

# Richard:
    - They have a mail report which includes performance graphs. That makes it
     easy to find problems.
# Philip (0-day): 
    - They compare the results to the previous runs
    - They determine threshold automatically
    - They try to do automatic bisects

How do you identify false-failures:
# Philip:
   - 0-day does automatic bisect. If they find the regression: fine;
     If they don't: they drop the false failure.

# TI Opentest:
    - They use a running average + stddev to compare the new result with history

# Kevin:
    - For aggregation on the test suite level, we need to document what the
      expected or allowed failures are.

= Board farm standards =

# Tim:
   Would be great if the would have a standard for APIs for board farm
   hardware manufacturer

# Xilinx:
   Just go 1:1 for corporate setups, to avoid the cable problem

# Chris:
   Cables are not the problem, they can work fine. You need to have an
   abstraction layer

# Mark:
   Everyone needs to find out which HW works, which doesn't

# Tim Bird:
   Would be nice to have a DUT Controller available on seed studio

# Jan-Simon:
   Network-Logging-Recv needs to be supported, minimal impact on the DUT is important

 - labgrid - what does it do for plugins?
   - has a python API for modules
     - example: there's a module for pdudaemon
     - example: there's a module for web power switch (by Power Solutions, Inc.)

 - observation: there are lots of different control points for the lab
  and DUT.

# Tim: can we start with a single API, and expand from there
  - suggest power control, as it's a single bit
    - actually, it can have voltage, bounce, etc - it's more complicated

  - decision(!!) to standardize on pdudaemon for power control
    - put power control drivers in that project

# TI OpenTest:
    OpenTest has a generic interface for Multimeters

# Geert:
  - iiodaemon might be good for measurement API (measurement drivers for
    things like power)
  - also have gpiod

# Tim Orling:
    We should document "Design for Test" best practices

# Kevin:
    kernelci will become a LF project (more compute, more storage)

# Tim Orling:
    has been talking with Kevin about "os-ci"

    decoupling their tests from lava were a lot of work

# Tim:
    Hard to maintain SW/test for HW you don't have

# Mark:
    This is a known problem for kernel maintainers ;)

== Wrap-up and overview ==
# ?:
 Everyone should learn other people's systems

# Michal:
  Would be great if every project would have a document:
    "how to 'hello world'" on a beagle-bone".
   - hello test, but with description of execution flow through framework

# Tim:
  survey is a start at sharing information.

= future coordination =
 - mailing list

# ?:
   - how about a dedicated list, not yocto-project based?
   - how about something on vger?
       - lists don't handle html
       - it may not be archived
       - it's not specific to the kernel
   - most attendees seemed to think it's OK to keep the current list
   - there needs to be more activity on the list - it looks dead
     - some of us only doing this part time
   - everyone try to generate more traffic

# Tim:
    Try more pin-pointed discussions on the list rather than large mails

  - discussion on whether to keep the mailing list
    - decision is to keep on YP list for now

 - documents
   - where to put documents?
   - decision to put all stuff on elinux wiki for now

# Jan:
    Maybe create the "Design for Test" document first

# Michal:
    Create a list of test cases on the elinux wiki

 - meetings
   - when to meet again?
     - what about at plumbers?
       - don't need sponsor
       - has a problem that it sells out too quick
         - not as many people visit it 
     - decision to do: ELCE 2019, Lyon France
       - testing track, testing BOF
         - private meeting again?
           - not sure
           - would need to get sponsorship again - may be
             cheaper if only a meeting room

# Richard:
    Do we have a way to speak with one voice?
       => one document endorsed by kernelci and yocto?

   - do we need an organization?
     - we are here as individuals, not speaking for our companies?
     - can kernelci project be the "voice"

# Kevin:
       - Kevin can't speak for board or for kernelci project
         - it's not formed yet

 - first projects:
   - test definition
     - Tim will do another survey 
       - not questions, but just ask for a link to an example of a test
   - common run artifacts
     - common results format
       - just use xunit?
     - who is doing this?
   - test execution API (E)
     - maybe start with list of phases
   - stuff on wiki:
     - main Automated Testing page
     - list of links to test repositories
       - get a URL for each test system

A picture was taken of summit attendees

= Decisions from the summit =
* pdudaemon will serve as our first DUT control API consolidation point
  - please put your DUT power control driver in pdudaemon
* we will continue to use the current mailing list for discussions, for now
* we will save information and documents on the elinux wiki
* our next physical meeting(s) will be at ELCE 2019

= Action Items from the summit =
AI - refine glossary over time to remove ambiguity (??)
AI - modify diagram with discussed changes (Kevin?)
AI - create elinux wiki page for Automated Testing topic (Tim)
AI - create elinux wiki page for Test Systems (Tim)
  - with links to repositories for each system
AI - collect Run Artifact fields (for possible RA standard) (??)
AI - collect Test Definition fields (for possible TD standard) (Tim)
AI - send a list of test phases to the list (start of API 'E' discussions) (Tim)
AI - create Debian package for pdudaemon (Tim Orling)
AI - create an automated test project in the Linux Foundation (Kevin)
 - currently called KernelCI project
AI - arrange for sessions and meetings at ELCE 2019 (Tim)
AI - create "Design for Testing" document aimed at board hardware designers (??)