Difference between revisions of "Tests:R-CAR-RAVB-RX-Checksum-Offload"

From eLinux.org
Jump to: navigation, search
(= Without RX Checksum Offload)
 
(One intermediate revision by the same user not shown)
Line 60: Line 60:
 
Note that perf record writes to a file in /run. This was chosen as that directory is mounted as a tmpfs filesystem backed by memory.
 
Note that perf record writes to a file in /run. This was chosen as that directory is mounted as a tmpfs filesystem backed by memory.
 
Writing to a file in an NFS partition significantly impacts the meaningfulness of results collected.
 
Writing to a file in an NFS partition significantly impacts the meaningfulness of results collected.
 +
 +
 +
In terms of performance throughput is close to gigabit line-rate both with
 +
and without RX checksum offload enabled. Perf output, however, appears to
 +
indicate that significantly less time is spent in do_csum() when RX checksum offload is enabled.
 +
This is the expected result.
  
 
==== With RX Checksum Offload ====
 
==== With RX Checksum Offload ====
Line 141: Line 147:
 
     1.90%    ksoftirqd/0  [kernel.kallsyms]  [k] __netdev_alloc_skb               
 
     1.90%    ksoftirqd/0  [kernel.kallsyms]  [k] __netdev_alloc_skb               
 
     1.52%    ksoftirqd/0  [kernel.kallsyms]  [k] __slab_alloc.isra.79           
 
     1.52%    ksoftirqd/0  [kernel.kallsyms]  [k] __slab_alloc.isra.79           
</pre>
 
 
=== Inspect Available Governors ===
 
 
 
=== Inspect Available Governors ===
 
 
<pre>
 
# grep . */cpufreq/scaling_available_governors
 
cpu0/cpufreq/scaling_available_governors:conservative performance
 
cpu1/cpufreq/scaling_available_governors:conservative performance
 
cpu2/cpufreq/scaling_available_governors:conservative performance
 
cpu3/cpufreq/scaling_available_governors:conservative performance
 
</pre>
 
 
== Exercise CPUFreq Support ==
 
 
=== Change to cpu sysfs directory ===
 
 
<pre>
 
# cd /sys/devices/system/cpu
 
</pre>
 
 
=== Set Governor ===
 
 
The conservative governor will be used for this test
 
 
<pre>
 
# echo conservative > cpu0/cpufreq/scaling_governor
 
 
# grep . */cpufreq/scaling_governor
 
cpu0/cpufreq/scaling_governor:conservative
 
cpu1/cpufreq/scaling_governor:conservative
 
cpu2/cpufreq/scaling_governor:conservative
 
cpu3/cpufreq/scaling_governor:conservative
 
</pre>
 
 
=== Observe CPU Frequency Changes ===
 
 
On an idle system:
 
# Check the frequency; it should be a low value
 
# Apply some load to the system
 
# Check the frequency again; it should be a higher value
 
# Wait; the system is once again idle
 
# Check the frequency one last time; it should be reduced again
 
 
<pre>
 
# grep . */cpufreq/scaling_cur_freq                                           
 
cpu0/cpufreq/scaling_cur_freq:500000
 
cpu1/cpufreq/scaling_cur_freq:500000
 
cpu2/cpufreq/scaling_cur_freq:500000
 
cpu3/cpufreq/scaling_cur_freq:500000
 
 
# for i in $(seq 1000000); do :; done
 
# grep . */cpufreq/scaling_cur_freq
 
cpu0/cpufreq/scaling_cur_freq:1500000
 
cpu1/cpufreq/scaling_cur_freq:1500000
 
cpu2/cpufreq/scaling_cur_freq:1500000
 
cpu3/cpufreq/scaling_cur_freq:1500000
 
 
# sleep 5
 
# grep . */cpufreq/scaling_cur_freq
 
cpu0/cpufreq/scaling_cur_freq:500000
 
cpu1/cpufreq/scaling_cur_freq:500000
 
cpu2/cpufreq/scaling_cur_freq:500000
 
cpu3/cpufreq/scaling_cur_freq:500000
 
 
</pre>
 
</pre>

Latest revision as of 04:32, 14 September 2017

Kernel Version Configuration

RX Checksum Offload support for RAVB is currently available in a topic branch:

https://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas.git topic/ravb-rx-checksum-offload

The ARM64 defconfig was used. The following option was distabled to produce a kernel image small enough to boot in the environment used for testing.

  • CONFIG_SOUND

The following option was also disabled as the sub-system in question seems to fail to build in the net-next revision that the topic/ravb-rx-checksum-offload branch is based on, the latest net-next revision at the time.

  • CONFIG_DRM

User Space Configuration

The tests described below requires netperf to be installed both on the board being tested and the host specified by the -H option when netperf is invoked on the board. netserver, which is part of the netperf package, should be running on the host.

Perf is used to record CPU usage during the test. For this reason perf needs to be installed on the board being tested.

Hardware Environment

  • Salvator-X/r8a7795 (Gen 3 R-Car H3 SoC) ES1.0
  • Salvator-X/r8a7796 (Gen 3 R-Car M3-W SoC) ES1.0

The results shown below are from tests performed on the Salvator-X/r8a7796.
The Salvator-XS/r8a7795 gives the same results.

Verify RAVB RX Checksum Offload Support

Verify Driver Initialisation

Initialisation of RAVB can be checked by inspection of the output of dmesg.

# dmesg | grep ravb
[    1.291370] libphy: ravb_mii: probed
[    1.295837] ravb e6800000.ethernet eth0: Base address at 0xe6800000, 2e:09:0a:00:be:d8, IRQ 45.
[    5.025952] ravb e6800000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx

Verify Configurability of RX Checksum Offload

# ethtool -k eth0 | grep rx-checksum
rx-checksumming: on
# ethtool -K eth0 rx off
# ethtool -k eth0 | grep rx-checksum
rx-checksumming: off
# ethtool -K eth0 rx on
# ethtool -k eth0 | grep rx-checksum
rx-checksumming: on

Run netperf TCP_MAERTS tests

When run on the board this exercises RX using the RAVB by recieving a stream of TCP packets from the host.

Note that perf record writes to a file in /run. This was chosen as that directory is mounted as a tmpfs filesystem backed by memory. Writing to a file in an NFS partition significantly impacts the meaningfulness of results collected.


In terms of performance throughput is close to gigabit line-rate both with and without RX checksum offload enabled. Perf output, however, appears to indicate that significantly less time is spent in do_csum() when RX checksum offload is enabled. This is the expected result.

With RX Checksum Offload

# ethtool -K eth0 rx on
# ethtool -k eth0 | grep rx-checksum
rx-checksumming: on
# /usr/bin/perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo
enable_enobufs failed: getprotobyname
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00     938.78   
[ perf record: Woken up 14 times to write data ]
[ perf record: Captured and wrote 3.524 MB /run/perf.data (~153957 samples) ]

# perf_3.16 report -i /run/perf.data | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 75K of event 'cycles'
# Event count (approx.): 19704920110
#
# Overhead          Command      Shared Object                                Symbol
# ........  ...............  .................  ....................................
#
    19.49%      ksoftirqd/0  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore     
     9.88%      ksoftirqd/0  [kernel.kallsyms]  [k] __pi_memcpy                     
     7.33%      ksoftirqd/0  [kernel.kallsyms]  [k] skb_put                         
     7.00%      ksoftirqd/0  [kernel.kallsyms]  [k] ravb_poll                       
     3.89%      ksoftirqd/0  [kernel.kallsyms]  [k] dev_gro_receive                 
     3.65%          netperf  [kernel.kallsyms]  [k] __arch_copy_to_user             
     3.43%          swapper  [kernel.kallsyms]  [k] arch_cpu_idle                   
     2.77%          swapper  [kernel.kallsyms]  [k] tick_nohz_idle_enter            
     1.85%      ksoftirqd/0  [kernel.kallsyms]  [k] __netdev_alloc_skb              
     1.80%          swapper  [kernel.kallsyms]  [k] _raw_spin_unlock_irq            
     1.64%      ksoftirqd/0  [kernel.kallsyms]  [k] __slab_alloc.isra.79            
     1.62%      ksoftirqd/0  [kernel.kallsyms]  [k] __pi___inval_cache_range        

Without RX Checksum Offload

# ethtool -K eth0 rx off
# ethtool -k eth0 | grep rx-checksum
rx-checksumming: off
# perf_3.16 record -o /run/perf.data -a netperf -t TCP_MAERTS -H 10.4.3.162
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.3.162 () port 0 AF_INET : demo
enable_enobufs failed: getprotobyname
Recv   Send    Send                          
Socket Socket  Message  Elapsed              
Size   Size    Size     Time     Throughput  
bytes  bytes   bytes    secs.    10^6bits/sec  

 87380  16384  16384    10.00     941.09   
[ perf record: Woken up 14 times to write data ]
[ perf record: Captured and wrote 3.411 MB /run/perf.data (~149040 samples) ]

# perf_3.16 report -i /run/perf.data | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
# Samples: 73K of event 'cycles'
# Event count (approx.): 18682878466
#
# Overhead        Command      Shared Object                                Symbol
# ........  .............  .................  ....................................
#
    17.50%    ksoftirqd/0  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore     
    10.60%    ksoftirqd/0  [kernel.kallsyms]  [k] __pi_memcpy                     
     7.91%    ksoftirqd/0  [kernel.kallsyms]  [k] skb_put                         
     6.95%    ksoftirqd/0  [kernel.kallsyms]  [k] do_csum                         
     6.22%    ksoftirqd/0  [kernel.kallsyms]  [k] ravb_poll                       
     3.84%    ksoftirqd/0  [kernel.kallsyms]  [k] dev_gro_receive                 
     2.53%        netperf  [kernel.kallsyms]  [k] __arch_copy_to_user             
     2.53%        swapper  [kernel.kallsyms]  [k] arch_cpu_idle                   
     2.27%        swapper  [kernel.kallsyms]  [k] tick_nohz_idle_enter            
     1.90%    ksoftirqd/0  [kernel.kallsyms]  [k] __pi___inval_cache_range        
     1.90%    ksoftirqd/0  [kernel.kallsyms]  [k] __netdev_alloc_skb              
     1.52%    ksoftirqd/0  [kernel.kallsyms]  [k] __slab_alloc.isra.79