Difference between revisions of "RTWG-discussion-points"

From eLinux.org
Jump to: navigation, search
(table of test attributes)
Line 157: Line 157:
  
 
== table of test attributes ==
 
== table of test attributes ==
{|style="border:1px;"
+
{|border="1" cellspacing="0" cellpadding="5"
|-
+
|- style="background:#CCCCCC"
 
|attribute / test->
 
|attribute / test->
 
|rf-etri
 
|rf-etri
Line 182: Line 182:
 
|excel charts
 
|excel charts
 
|none
 
|none
 +
|-
 +
|interrupt source
 +
|/dev/rtc - periodic interrupt
 +
|serial port
 +
|lpar interrupt (hypervisor)
 +
|timer system (reprogrammed inter intterrupt)
 +
|-
 +
|# machines
 +
|1
 +
|1
 +
|2
 +
|1
 +
|-
 +
|What is measured?
 +
|B->C
 +
|B->C
 +
|0->B
 +
|C->C2
 +
|-
 +
|Load source
 +
|find, ping, hackbench
 +
|ping, lmbench
 +
|hdparm, netperf
 +
|none
 +
|-
 +
|performance measurment
 +
|hackbench
 +
|none, but host could measure serial throughput
 +
|hdparm, netperf, ping
 +
|none
 +
|-
 +
|has user space element?
 +
|yes
 +
|yes
 +
|no
 +
|yes
 +
|-
 +
|kernel patches needed
 +
|yes - patches to /dev/rtc
 +
|yes - needs UIO support (mainlined in 2.6.23)
 +
needs kernel module
 +
|yes - needs kernel module
 +
instrumentation of lpar handling code
 +
|no, but needs HRT support for arch.
 
|}
 
|}

Revision as of 23:59, 3 September 2007

Review of test programs

lpptest

problems:

  • requires 2 machines
  • requires parallel port (not many embedded boards have this)
  • uses TSC, which is i386-specific

benefits:

  • built into the rt-preempt patch
  • 2-machine system gives unbiased timing results
    • instrumentation does not interfere with the timings

realfeel

benefits:

  • is one-machine
  • uses /dev/rtc, does not use timer interrupt
    • this isolates the ISR handling from timer operations
  • is a very simple test

problems:

  • uses TSC, which is i386-specific
  • doesn't have an instrumentation point in the kernel
    • no measurement of int latency, only process start latency
  • process start latency is "guessed" from expected interrupt time
    • that is, it is not actually measured
  • not all boards have a working /dev/rtc driver


igel test

This tests scheduling latency on receipt of a character on an SM501 uart. The UART driver gets a timestamp on ISR start, and can be asked for them from user-space after a run. The user-space driver measures the time from ISR start to task start. Test is specific to SH architecture. Uses SH hardware timer.

SH4 running at 240 MHZ, 64 MB memory.

See http://tree.celinuxforum.org/CelfPubWiki/UserLevelDeviceDriver (page is in Japanese)

See also: CELF Jamboree 14 presentation

  • Yung Joon will ask Takao Ikoma to translate this page.
  • Matsubara-san will update the driver and some of the text on the page.
    • will attach latest driver source
  • Tim will provide Ikoma-san e-mail

Test shows some important results:

  • 2.6.16.4 has much better scheduling latency under load than 2.4.20
    • test was with kernel preemption - withOUT RT-preempt patches loaded.
  • signal delivery method of waking up tasks took a lot longer than unblocking an I/O operation
    • signal delivery is a bad method to use for measuring RT performance

Interbench

Need to investigate, but looks like a test of scheduling latency.

See http://members.optusnet.com.au/ckolivas/interbench/


celleb test

Runs test with Linux on hypervisor. Hypervisor generates interrupt to Linux at periodic intervals. The kernel measures the ISR start time, and can query the hypervisor for the intended interrupt initiation time.

Results on page 7 are results from hdparm, normalized against a baseline configuration. Note that page 7 shows results from a kernel with a bug (in the non-RT case).

set PPC64_TLB_BATCH_NR to 1 why?

In english translation lpar = "logical partition". This refers to a single environment under the hypervisor. The lpar interrupt was used because it was periodic.

hdparm used for load, not for measurement. The test provided 3 different measurements of performance.

Each run is a shown as a separate path on the graph. The paths are separated to be able to distinguish the lines. (They have different 0-bases on the graph). point on graph shows amount of latency for each interrupt. Interrupt is on horizontal axis.

page 14 shows interrupt latency, under load.

All results are shown normalized, to avoid disclosing actual numbers for the CELL processor and hypervisor.

Stair-step results in the graph are from hdparm being run in a loop. Sometimes hdparm was putting load on the system, and sometimes it was not (at regular intervals).

page 15 has previous results, with non-rt kernel with bug.

page 16 shows corrected results

page 18 - load is netperf

page 26 - send performance of nonRT kernel is used as baseline for normalization. Measurement is of network performance (with several different configurations). Result of good performance for non-RT kernel, but with interupt threading is not expected.

Page 27 - shows throughput performance for hdparm for various configurations. Different udma modes are configured via hdparm. (try man hdparm)

Page 28 is Owa-san's hypothesis for why the network throughput is better in the non-rt, with irq threading, case. There are fewer hardware interrupts. It may be because the interrupts are masked longer, so context switches are avoided. Also, because of longer duration interrupt-off times, the network card may buffer packets, and the network stack may be able to process more data at a time.

Tried to test:

  • did RT-preempt patch for PPC64 work?
    • result: yes, to some extent
  • wanted to see if RT-preempt affected throughput performance
    • result: it does affect performance some, but it is not a big effect.

etri testing

  • wanted to compare RT-preempt with regular kernel.
    • worst case latency was good with RT-preempt patch
  • Wanted to use modified realfeel code (realfeel-etri)
    • it worked.
  • wanted to know relationship between scheduling latency and throughput performance
    • result: there were possible throughput problems with the RT-preempt kernel
  • haven't used realfeel-etri on ARM, yet.

realfeel-etri

problems:

  • uses TSC (but abstracts the get_ticks() interface
    • have to use something else on ARM
  • needs /dev/RTC (may not be available on other embedded platforms)

benefits:

  • is better than realfeel, because it measures time from ISR start
  • uses read-unblock, which has good code path through kernel.

cyclictest

User timestamps from timer system to measure performance of timer system.

benefits:

  • seems to be what other people are using.
  • it should be architecture-independent
    • (after HRT arch support is provided in kernel)

problems:

  • requires HRT support
  • requires newer version of glibc and compiler
  • requires newer structure or API for timer info (not on MIPS)
  • it doesn't have kernel instrumentation point


Notes:

  • it works on SH, on 2.6.22 or later.
  • it works on ppc64
  • it didn't work on MIPS for Tim

don't know how to interpret results (don't know what constitutes good values)

table of test attributes

attribute / test-> rf-etri igel celleb cyclictest
platform specific? yes - uses TSC, requires /dev/rtc yes - uses SH timestamp yes - uses lpar interrupt, uses cell timebase regs. no, but requires HRT support. Uses linux time system for timestamps
data format/handling text->csv->excel csv text text->csv->excel none
presentation excel charts gnuplot excel charts none
interrupt source /dev/rtc - periodic interrupt serial port lpar interrupt (hypervisor) timer system (reprogrammed inter intterrupt)
# machines 1 1 2 1
What is measured? B->C B->C 0->B C->C2
Load source find, ping, hackbench ping, lmbench hdparm, netperf none
performance measurment hackbench none, but host could measure serial throughput hdparm, netperf, ping none
has user space element? yes yes no yes
kernel patches needed yes - patches to /dev/rtc yes - needs UIO support (mainlined in 2.6.23)

needs kernel module

yes - needs kernel module

instrumentation of lpar handling code

no, but needs HRT support for arch.