RTWG-discussion-points

lpptest
problems:
 * requires 2 machines
 * requires parallel port (not many embedded boards have this)
 * uses TSC, which is i386-specific

benefits:
 * built into the rt-preempt patch
 * 2-machine system gives unbiased timing results
 * instrumentation does not interfere with the timings

realfeel
benefits:
 * is one-machine
 * uses /dev/rtc, does not use timer interrupt
 * this isolates the ISR handling from timer operations
 * is a very simple test

problems:
 * uses TSC, which is i386-specific
 * doesn't have an instrumentation point in the kernel
 * no measurement of int latency, only process start latency
 * process start latency is "guessed" from expected interrupt time
 * that is, it is not actually measured
 * not all boards have a working /dev/rtc driver

igel test
This tests scheduling latency on receipt of a character on an SM501 uart. The UART driver gets a timestamp on ISR start, and can be asked for them from user-space after a run. The user-space driver measures the time from ISR start to task start. Test is specific to SH architecture. Uses SH hardware timer.

SH4 running at 240 MHZ, 64 MB memory.

See http://tree.celinuxforum.org/CelfPubWiki/UserLevelDeviceDriver (page is in Japanese)

See also: CELF Jamboree 14 presentation


 * Yung Joon will ask Takao Ikoma to translate this page.
 * Matsubara-san will update the driver and some of the text on the page.
 * will attach latest driver source
 * Tim will provide Ikoma-san e-mail

Test shows some important results:
 * 2.6.16.4 has much better scheduling latency under load than 2.4.20
 * test was with kernel preemption - withOUT RT-preempt patches loaded.
 * signal delivery method of waking up tasks took a lot longer than unblocking an I/O operation
 * signal delivery is a bad method to use for measuring RT performance

Interbench
Need to investigate, but looks like a test of scheduling latency.

See http://members.optusnet.com.au/ckolivas/interbench/

celleb test
Runs test with Linux on hypervisor. Hypervisor generates interrupt to Linux at periodic intervals. The kernel measures the ISR start time, and can query the hypervisor for the intended interrupt initiation time.

Results on page 7 are results from hdparm, normalized against a baseline configuration. Note that page 7 shows results from a kernel with a bug (in the non-RT case).

set PPC64_TLB_BATCH_NR to 1 why?

In english translation lpar = "logical partition". This refers to a single environment under the hypervisor. The lpar interrupt was used because it was periodic.

hdparm used for load, not for measurement. The test provided 3 different measurements of performance.

Each run is a shown as a separate path on the graph. The paths are separated to be able to distinguish the lines. (They have different 0-bases on the graph). point on graph shows amount of latency for each interrupt. Interrupt is on horizontal axis.

page 14 shows interrupt latency, under load.

All results are shown normalized, to avoid disclosing actual numbers for the CELL processor and hypervisor.

Stair-step results in the graph are from hdparm being run in a loop. Sometimes hdparm was putting load on the system, and sometimes it was not (at regular intervals).

page 15 has previous results, with non-rt kernel with bug.

page 16 shows corrected results

page 18 - load is netperf

page 26 - send performance of nonRT kernel is used as baseline for normalization. Measurement is of network performance (with several different configurations). Result of good performance for non-RT kernel, but with interupt threading is not expected.

Page 27 - shows throughput performance for hdparm for various configurations. Different udma modes are configured via hdparm. (try man hdparm)

Page 28 is Owa-san's hypothesis for why the network throughput is better in the non-rt, with irq threading, case. There are fewer hardware interrupts. It may be because the interrupts are masked longer, so context switches are avoided. Also, because of longer duration interrupt-off times, the network card may buffer packets, and the network stack may be able to process more data at a time.

Tried to test:
 * did RT-preempt patch for PPC64 work?
 * result: yes, to some extent
 * wanted to see if RT-preempt affected throughput performance
 * result: it does affect performance some, but it is not a big effect.

etri testing

 * wanted to compare RT-preempt with regular kernel.
 * worst case latency was good with RT-preempt patch
 * Wanted to use modified realfeel code (realfeel-etri)
 * it worked.
 * wanted to know relationship between scheduling latency and throughput performance
 * result: there were possible throughput problems with the RT-preempt kernel


 * haven't used realfeel-etri on ARM, yet.

realfeel-etri
problems:
 * uses TSC (but abstracts the get_ticks interface
 * have to use something else on ARM
 * needs /dev/RTC (may not be available on other embedded platforms)

benefits:
 * is better than realfeel, because it measures time from ISR start
 * uses read-unblock, which has good code path through kernel.

cyclictest
User timestamps from timer system to measure performance of timer system.

benefits:
 * seems to be what other people are using.
 * it should be architecture-independent
 * (after HRT arch support is provided in kernel)

problems:
 * requires HRT support
 * requires newer version of glibc and compiler
 * requires newer structure or API for timer info (not on MIPS)
 * it doesn't have kernel instrumentation point

Notes:
 * it works on SH, on 2.6.22 or later.
 * it works on ppc64
 * it didn't work on MIPS for Tim

don't know how to interpret results (don't know what constitutes good values)

shared Test Bed ideas
See Darren Hart's tests at : http://www.dvhart.com/~dvhart/ols2007