R-Car/Tests:rcar gen3 thermal

This document describes how to test the Thermal (rcar_gen3_thermal) functionality on Renesas R-Car Gen3.

Kernel Version Configuration
Thermal support for R-Car Gen3 H3 and M3-W is available since v4.10. The latest additions for using the Thermal driver and CPUFreq to cool the CPU is available in a topic branch:

https://git.ragnatech.se/linux thermal/cooling

The ARM64 defconfig was used in with the following extra options enabled:

CONFIG_RCAR_GEN3_THERMAL=y

Hardware Environment

 * Salvator-X/r8a7795 (Gen 3 R-Car H3 SoC) ES1.0
 * Salvator-XS/r8a7795 (Gen 3 R-Car H3 SoC) ES2.0
 * Salvator-X/r8a7796 (Gen 3 R-Car M3-W SoC) ES1.0

The results shown below are from tests performed on the Salvator-XS/r8a7795.

Verify Driver Initialisation
Initialisation of Thermal support can be checked by checking for the presence of thermal_zone directories in sysfs

/sys/class/thermal/thermal_zone1 /sys/class/thermal/thermal_zone2 /sys/class/thermal/thermal_zone0
 * 1) find /sys/class/thermal/ -name thermal_zone*

Inspect Temperatures
/sys/class/thermal/thermal_zone0/temp:34000 /sys/class/thermal/thermal_zone1/temp:40500 /sys/class/thermal/thermal_zone2/temp:36500
 * 1) grep . /sys/class/thermal/thermal_zone*/temp

Exercise Thermal Support
On an idle system:
 * 1) Check the temperatures; it should be a low value
 * 2) Apply some load to the system
 * 3) Check the temperatures again; it should be a slightly higher value
 * 4) Wait; the system is once again idle
 * 5) Check the temperatures one last time; it should be reduced again

/sys/class/thermal/thermal_zone0/temp:35000 /sys/class/thermal/thermal_zone1/temp:41500 /sys/class/thermal/thermal_zone2/temp:37500
 * 1) grep . /sys/class/thermal/thermal_zone*/temp

/sys/class/thermal/thermal_zone0/temp:35500 /sys/class/thermal/thermal_zone1/temp:43500 /sys/class/thermal/thermal_zone2/temp:39500
 * 1) for i in $(seq 1000000); do :; done
 * 2) grep . /sys/class/thermal/thermal_zone*/temp

/sys/class/thermal/thermal_zone0/temp:35000 /sys/class/thermal/thermal_zone1/temp:42000 /sys/class/thermal/thermal_zone2/temp:38000
 * 1) sleep 10
 * 2) grep . /sys/class/thermal/thermal_zone*/temp

Verify Driver Initialisation
Initialisation of Thermal and CPUFreq support can be checked by checking for the presence of a cooling_device directory in sysfs

/sys/class/thermal/cooling_device0
 * 1) find /sys/class/thermal/ -name cooling_device*

Inspect Trip Points
There are two kinds of trip points used on this system passive and critical. If the critical trip point is reached and the system is shutdown. If the passive trip point is reached the systems is in need of passing cooling. To verify the integration of Thermal and CPUFreq only the passive trip points are of interest.

/sys/class/thermal/thermal_zone0/trip_point_0_temp:95000 /sys/class/thermal/thermal_zone0/trip_point_1_temp:120000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:95000 /sys/class/thermal/thermal_zone1/trip_point_1_temp:120000 /sys/class/thermal/thermal_zone2/trip_point_0_temp:95000 /sys/class/thermal/thermal_zone2/trip_point_1_temp:120000 /sys/class/thermal/thermal_zone0/trip_point_0_type:passive /sys/class/thermal/thermal_zone0/trip_point_1_type:critical /sys/class/thermal/thermal_zone1/trip_point_0_type:passive /sys/class/thermal/thermal_zone1/trip_point_1_type:critical /sys/class/thermal/thermal_zone2/trip_point_0_type:passive /sys/class/thermal/thermal_zone2/trip_point_1_type:critical
 * 1) grep . /sys/class/thermal/thermal_zone*/trip_point_*_{temp,type}

Exercise Cooling Support
In this example the passive trip point for all thermal zones are set to 95000 (95C), it might be hard to get a system up to that temperature. There are two different ways to work around that.


 * 1) Change and recompiling the device-tree description to trigger at a lower temperature. This is a useful tests as it demonstrates that the cooling device is functioning and that CPUFreq indeed lowers the temperature.
 * 2) Emulate the temperature reported by the thermal driver and observe in sysfs that the cooling_device changes its state.

Exercise Cooling Support by DT changes
The descriptions for the passive trip point for each thermal zone in the thermal-zones DT node needs to be changed from 95000 to a lower value (45000 will be used in this example). The complete process of changing and recompiling the device-tree is left as an exercise to the reader, but the adventures reader might try the following:

sed -i "s/temperature = ;/temperature = ;/" arch/arm64/boot/dts/renesas/r8a7795.dtsi

Once the new DT is in place inspect the new trip points to make sure everything is ok:

/sys/class/thermal/thermal_zone0/trip_point_0_temp:45000 /sys/class/thermal/thermal_zone0/trip_point_1_temp:120000 /sys/class/thermal/thermal_zone1/trip_point_0_temp:45000 /sys/class/thermal/thermal_zone1/trip_point_1_temp:120000 /sys/class/thermal/thermal_zone2/trip_point_0_temp:45000 /sys/class/thermal/thermal_zone2/trip_point_1_temp:120000 /sys/class/thermal/thermal_zone0/trip_point_0_type:passive /sys/class/thermal/thermal_zone0/trip_point_1_type:critical /sys/class/thermal/thermal_zone1/trip_point_0_type:passive /sys/class/thermal/thermal_zone1/trip_point_1_type:critical /sys/class/thermal/thermal_zone2/trip_point_0_type:passive /sys/class/thermal/thermal_zone2/trip_point_1_type:critical
 * 1) grep . /sys/class/thermal/thermal_zone*/trip_point_*_{temp,type}

Terminal 1 -- The Observer

In one terminal use the watch command to Observe CPU Frequency Changes. It might be hard to observe the CPUFreq changes (after load is applied) since a low temperature trip point is used and the cooling is effective, so the cooling is only active in short intervals. Using watch at a 0.2 second update rate proved the most easiest for me to spot the change, your mileage might vary.

Every 0.2s: grep. /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq
 * 1) watch -n 0.2 grep . /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq

/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1500000 /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:1500000 /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:1500000 /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:1500000

Terminal 2 -- Load Generation

Apply some load to the system and in a different terminal and (hopefully) push the temperature above the trip point and observe the cooling kick in and bring the temperature down. It might be necessary to spawn more then one of the bash loops to provide sufficient load to increase the temperature (like in the example bellow), another option is to use a tool like stress which is designed to add load to a system.

[1] 9771 [2] 9794 [3] 9817 [4] 9827
 * 1) for i in $(seq 1000000); do :; done &
 * 1) for i in $(seq 1000000); do :; done &
 * 1) for i in $(seq 1000000); do :; done &
 * 1) for i in $(seq 1000000); do :; done &

When the load is running it should be observable in the observer terminal that the CPUFreq is swapping between 500000 (cooling is active) and 1500000 (no cooling).

Exercise Cooling Support by temperature emulation
This is an easier test the DT changes approach but do not in a clear way show that changing the CPUFreq lowers the temperature, as we disable true temperature reporting in order to emulate the passive trip point being reached.

On an idle system:
 * 1) Check the CPUFreq of all CPUs; they should all be running at a high frequency
 * 2) Emulate the temperature reported by the Thermal driver; set it just above the passive trip point value
 * 3) Check the CPUFreq of all CPUs; they should all be running at a lower frequency
 * 4) Disable the emulate the temperature reported by the Thermal driver

/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:1500000 /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:1500000 /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:1500000 /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:1500000
 * 1) grep . /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq


 * 1) echo 96000 > /sys/class/thermal/thermal_zone0/emul_temp

/sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq:500000 /sys/devices/system/cpu/cpu1/cpufreq/scaling_cur_freq:500000 /sys/devices/system/cpu/cpu2/cpufreq/scaling_cur_freq:500000 /sys/devices/system/cpu/cpu3/cpufreq/scaling_cur_freq:500000
 * 1) grep . /sys/devices/system/cpu/*/cpufreq/scaling_cur_freq


 * 1) echo 0 > /sys/class/thermal/thermal_zone0/emul_temp