Difference between revisions of "Flash Filesystem Benchmarks Protocol"

From eLinux.org
Jump to: navigation, search
(first commit)
 
(Add category)
 
(15 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
This page refers to the benchmarks presented at [[Flash Filesystem Benchmarks]]
 
This page refers to the benchmarks presented at [[Flash Filesystem Benchmarks]]
==What is being tested ?==
 
  
===Filesystems===
+
== Tested filesystems ==
The test bench can cover squashfs, jffs2, ubifs, yaffs2 and logfs.
+
We weren't able to make LogFS mount a filesystem.
+
So, the results are available for jffs2, ubifs, yaffs2 and squashfs.
+
  
===Kernel & Configuration===
+
The test bench covers the following filesystems :
For now, only 2.6.38.8 on an igepv2 has been tested, patched with yaffs2 (git revision
+
* '''JFFS2''' filesystem. All tests are performed with this filesystem.
a608236b1, May 13 2011).
+
* '''UBIFS''' filesystem. All tests are performed with this filesystem.
All filesystems supports (as well as UBI support) were built as modules.
+
* '''SquashFS over ''ubiblk'''''. ''ubiblk'' is a UBI to block translation module that Free Electrons has developed, it makes it possible to mount a SquashFS filesystem inside an UBI volume. In this case, none of the tests involving write or removal are performed, since SquashFS is a read-only filesystem. Those results are labeled <code>squashfs-ubiblk</code> in our graphs.
Jffs2 summary was enabled.
+
* '''SquashFS over ''mtdblock_ro'' over ''gluebi''''', which makes it possible to mount a SquashFS filesystem inside an UBI volume. Contrary to ''ubiblk'', this solution is completely in the mainline Linux kernel. In this case, none of the tests involving write or removal are performed, since SquashFS is a read-only filesystem. Those results are labeled <code>squashfs-gluebi</code> in our graphs.
 +
* '''YAFFS2 filesystem'''. All tests are performed with this filesystem. However, unfortunately, YAFFS2 doesn't build for recent (2.6.39, 3.0) kernel versions, so we were unable to update our test results for YAFFS2.
 +
* '''Raw'''. This is not a filesystem, but only raw read performances on the flash measured by ''nanddump''ing the flash.
  
Since 2.6.39 came out recently, results for this kernel should follow soon.
+
All filesystem supports are built as module, including the UBI support. The ''logfs'' filesystem is not tested as it is way too unstable.
  
===Criteria===
+
== Tests performed ==
Several criteria have been measured:
+
* module initialisation time (cpu and real)
+
* module initialisation memory usage (x)
+
* mount time (cpu and real)
+
* mount memory usage (x)
+
* hot remount time (cpu and real)
+
* used space
+
* read time (cpu and real)
+
* removal time
+
* write time (cpu and real)
+
* big files write time (cpu and real)
+
  
Items marked with an (x) don't seem to be reliable yet (see below).
+
Multiple tests are performed on each filesystem / size couple, and those tests give the following list of values as result:
  
===Devices===
+
* <code>init_time</code> and <code>init_cpu_time</code> --> [[#Init test|module initialisation time]]
For now, only the IGEPv2 (revC 4) has been tested. Tests on a Beagleboard and a calao usb9263
+
* <code>init_mem</code> --> [[#Init test|module initialisation memory usage]] (x)
will follow soon.
+
* <code>mount_time</code> and <code>mount_cpu_time</code> --> [[#Mount test|mount time]]
 +
* <code>mount_mem</code> --> [[#Mount test|memory usage for mounting the filesystem]] (x)
 +
* <code>remount_time</code> --> [[#Mount test|hot remount time]]
 +
* <code>used_space</code> --> [[#Mount test|space used by the filesystem after mounting]]
 +
* <code>read_time</code> and <code>read_cpu_time</code> --> [[#Read timing test|read timing test]]
 +
* <code>remove_time</code> and <code>remove_cpu_time</code> --> [[#Erase test|erase test]]
 +
* <code>write_time</code> and <code>write_cpu_time</code> --> [[#Write test|write test]]
 +
* <code>video_write_time</code> and <code>video_write_cpu_time</code> --> [[#Big file write test|big - uncompressible - file write test]]
  
==How it is tested==
+
Items marked with an (x) don't seem to be reliable yet.
  
 +
The tests are run in the order as shown above.  One filesystem/size couple (e.g. jffs2/128MB) is tested, then the board is reset and the next filesystem/size couple is tested.
  
===Communication with the device===
+
Notes :
A serial connection with the device is assumed. The bench script waits until
+
* The time measurements are done with the <code>time</code> util (not the shell builtin) is used to launch the command to be timed. When a cache can speed the measured criterium up (e.g. read or remove), a <code>sync</code> is included in the command being timed. The CPU time as well as the wall-clock time are measured. So for example, <code>mount_time</code> in our graphs is the wall-clock time needed to perform the mount test, while <code>mount_cpu_time</code> in our graphs is the CPU time needed to perform the mount test.
a prompt (U-Boot or shell) is displayed and sends a batch of commands to be
+
* The memory measurements are done using the content of /proc/meminfo right before and after the measured command. The sum of "MemFree", "Buffers" and "Cached" is considered to be free memory. The memory usage is then the difference before/after. The results tend to show that this approach isn't reliable with regard to absolute memory consumption: results for one filesystem with different sizes do not always follow a law. Results are sometimes negative (memory usage<0).
executed. It waits for the next prompt, parses the result when needed, and
+
However, when comparing results (as opposed to "absolute results") of the different filesystems, some constantly have a low footprint whereas other have a large one. When the memory usage doesn't scale, it is also obvious.
sends the next commands.
+
 
+
Multiline shell commands aren't passed through the serial line but are
+
stored in shell files and these files are executed. However, we do so for
+
all filesystems and all sizes homogeneously, so the overhead is consistent.
+
 
+
===Kernel and RootFS===
+
The kernel is loaded in ram from an external storage (MMC, tftp etc.) to
+
ensure maximal free space on the flash. On most cards, it may be possible to
+
use the whole flash by putting the first stage bootloader and uboot on an
+
MMC card. However, we didn't need to for the tests on IGEPv2.
+
 
+
The RootFS is also put on an external storage (here, NFS). However, for
+
read/write operation, the output/input is stored/loaded to/from
+
devzero/ramfs to avoid hihgly random latency.
+
 
+
===Time measurements===
+
The <code>time</code> util (not the shell builtin) is used to launch the command to be
+
timed. When a cache can speed the measured criterium up (e.g. read or
+
remove), a <code>sync</code> is included in the command being timed.
+
 
+
The CPU time as well as the wall-clock time are measured.
+
 
+
===Memory usage measurements===
+
The memory usage is measured using the content of /proc/meminfo right before
+
and after the measured command. The sum of "MemFree", "Buffers" and "Cached"
+
is considered to be free memory. The memory usage is then the difference
+
before/after.
+
 
+
The results tend to show that this approach isn't reliable with regard to
+
absolute memory consumption: results for one filesystem with different sizes
+
do not always follow a law. Results are sometimes negative (memory usage<0).
+
However, when comparing results (as opposed to "absolute results") of the
+
different filesystems, some constantly have a low footprint whereas other
+
have a large one. When the memory usage doesn't scale, it is also obvious.
+
  
 
===Init test===
 
===Init test===
 
The init test consists of modprobing the filesystem driver.
 
The init test consists of modprobing the filesystem driver.
  
In the case of ubifs, it also consists of modprobing ubi and attaching a
+
In the case of a filesystem on top of UBI, it also consists of modprobing ubi and attaching a
 
device.
 
device.
  
Line 93: Line 53:
  
 
After the mount, a remount (<code>mount -o remount</code>) timing test is also
 
After the mount, a remount (<code>mount -o remount</code>) timing test is also
performed.
+
performed when applicable (ie. not for read-only filesystems).
  
 
The used space on flash is also measured, using <code>df</code>.
 
The used space on flash is also measured, using <code>df</code>.
Line 99: Line 59:
 
===Read timing test===
 
===Read timing test===
 
A tar archive of the test filesystem's content is created and written to
 
A tar archive of the test filesystem's content is created and written to
/dev/zero (Note: tar detects writes to /dev/null and discards them).
+
/dev/zero (Note: tar detects writes to /dev/null and discards them). A sync is performed afterward.
  
 
This test filesystem contains what would a root filesystem contain.
 
This test filesystem contains what would a root filesystem contain.
  
 
As reference time, a "raw" read operation is also performed: It consists of
 
As reference time, a "raw" read operation is also performed: It consists of
a "nanddump" of the same amount of flash as the uncompressed size of the
+
a <code>nanddump</code> of the same amount of flash as the uncompressed size of the
 
filesystem. Compressed filesystems may have a better result than this
 
filesystem. Compressed filesystems may have a better result than this
 
reference time.
 
reference time.
Line 120: Line 80:
 
This test is almost the same as the previous one but only one video is
 
This test is almost the same as the previous one but only one video is
 
written (possibly several times, as before).
 
written (possibly several times, as before).
 +
 +
==Communication with the device==
 +
A serial connection with the device is assumed. The bench script waits until
 +
a prompt (U-Boot or shell) is displayed and sends a batch of commands to be
 +
executed. It waits for the next prompt, parses the result and stores it in a
 +
SQL database when needed, and sends the next commands.
 +
 +
Multiline shell commands aren't passed through the serial line but are
 +
stored in shell files and these files are executed. However, we do so for
 +
all filesystems and all sizes homogeneously, so the overhead is consistent.
 +
 +
==Kernel and RootFS==
 +
The kernel is loaded in ram from an external storage (MMC, tftp etc.) to
 +
ensure maximal free space on the flash. On most cards, it may be possible to
 +
use the whole flash by putting the first stage bootloader and uboot on an
 +
MMC card. However, we didn't need to for the tests on IGEPv2.
 +
 +
The RootFS is also put on an external storage (here, NFS). However, for
 +
read/write operation, the output/input is stored/loaded to/from
 +
devzero/ramfs to avoid hihgly random latency.
  
 
==Issues==
 
==Issues==
Line 155: Line 135:
 
Running several occurencies of the tests, remove the highest and lowest
 
Running several occurencies of the tests, remove the highest and lowest
 
values and computing the average could be a solution.
 
values and computing the average could be a solution.
 +
 +
That solution has been implemented but its implementation isn't perfect: it should first remove the best and worts results and that would need a lot of runs.
  
 
===UBI not scaling as expected===
 
===UBI not scaling as expected===
Line 164: Line 146:
 
each reboot but if this size is defined to be that of the filesystem,
 
each reboot but if this size is defined to be that of the filesystem,
 
because of UBI overhead, ubifs will lack of space.
 
because of UBI overhead, ubifs will lack of space.
 +
 +
[[Category:Flash Filesystem Benchmarks]]

Latest revision as of 08:34, 28 October 2011

This page refers to the benchmarks presented at Flash Filesystem Benchmarks

Tested filesystems

The test bench covers the following filesystems :

  • JFFS2 filesystem. All tests are performed with this filesystem.
  • UBIFS filesystem. All tests are performed with this filesystem.
  • SquashFS over ubiblk. ubiblk is a UBI to block translation module that Free Electrons has developed, it makes it possible to mount a SquashFS filesystem inside an UBI volume. In this case, none of the tests involving write or removal are performed, since SquashFS is a read-only filesystem. Those results are labeled squashfs-ubiblk in our graphs.
  • SquashFS over mtdblock_ro over gluebi, which makes it possible to mount a SquashFS filesystem inside an UBI volume. Contrary to ubiblk, this solution is completely in the mainline Linux kernel. In this case, none of the tests involving write or removal are performed, since SquashFS is a read-only filesystem. Those results are labeled squashfs-gluebi in our graphs.
  • YAFFS2 filesystem. All tests are performed with this filesystem. However, unfortunately, YAFFS2 doesn't build for recent (2.6.39, 3.0) kernel versions, so we were unable to update our test results for YAFFS2.
  • Raw. This is not a filesystem, but only raw read performances on the flash measured by nanddumping the flash.

All filesystem supports are built as module, including the UBI support. The logfs filesystem is not tested as it is way too unstable.

Tests performed

Multiple tests are performed on each filesystem / size couple, and those tests give the following list of values as result:

Items marked with an (x) don't seem to be reliable yet.

The tests are run in the order as shown above. One filesystem/size couple (e.g. jffs2/128MB) is tested, then the board is reset and the next filesystem/size couple is tested.

Notes :

  • The time measurements are done with the time util (not the shell builtin) is used to launch the command to be timed. When a cache can speed the measured criterium up (e.g. read or remove), a sync is included in the command being timed. The CPU time as well as the wall-clock time are measured. So for example, mount_time in our graphs is the wall-clock time needed to perform the mount test, while mount_cpu_time in our graphs is the CPU time needed to perform the mount test.
  • The memory measurements are done using the content of /proc/meminfo right before and after the measured command. The sum of "MemFree", "Buffers" and "Cached" is considered to be free memory. The memory usage is then the difference before/after. The results tend to show that this approach isn't reliable with regard to absolute memory consumption: results for one filesystem with different sizes do not always follow a law. Results are sometimes negative (memory usage<0).

However, when comparing results (as opposed to "absolute results") of the different filesystems, some constantly have a low footprint whereas other have a large one. When the memory usage doesn't scale, it is also obvious.

Init test

The init test consists of modprobing the filesystem driver.

In the case of a filesystem on top of UBI, it also consists of modprobing ubi and attaching a device.

Both time and memory consumption (with inconsitencies explained above) are measured.

Mount test

This test simply consists of mounting a partition. (In the case of ubifs, attaching occured in the previous test)

Both time and memory are measured.

After the mount, a remount (mount -o remount) timing test is also performed when applicable (ie. not for read-only filesystems).

The used space on flash is also measured, using df.

Read timing test

A tar archive of the test filesystem's content is created and written to /dev/zero (Note: tar detects writes to /dev/null and discards them). A sync is performed afterward.

This test filesystem contains what would a root filesystem contain.

As reference time, a "raw" read operation is also performed: It consists of a nanddump of the same amount of flash as the uncompressed size of the filesystem. Compressed filesystems may have a better result than this reference time.

Erase test

The whole content of the filesystem is rm -rf-ed ; a sync is then performed. All of this is timed.

Write test

The content of a folder - previously mounted in a tmpfs - is copied into the flash partition. When this partition is large, it is done several times until the partition is almost full: (size / 8) times

Big file write test

This test is almost the same as the previous one but only one video is written (possibly several times, as before).

Communication with the device

A serial connection with the device is assumed. The bench script waits until a prompt (U-Boot or shell) is displayed and sends a batch of commands to be executed. It waits for the next prompt, parses the result and stores it in a SQL database when needed, and sends the next commands.

Multiline shell commands aren't passed through the serial line but are stored in shell files and these files are executed. However, we do so for all filesystems and all sizes homogeneously, so the overhead is consistent.

Kernel and RootFS

The kernel is loaded in ram from an external storage (MMC, tftp etc.) to ensure maximal free space on the flash. On most cards, it may be possible to use the whole flash by putting the first stage bootloader and uboot on an MMC card. However, we didn't need to for the tests on IGEPv2.

The RootFS is also put on an external storage (here, NFS). However, for read/write operation, the output/input is stored/loaded to/from devzero/ramfs to avoid hihgly random latency.

Issues

Memory usage measurement

The measured init memory footprint is sometimes negative. It also does not always follow any law with regard to scaling.

The results are however mostly consistent and make it possible to compare the considered filesystems.

Block filesystems and bad blocks

When a block filesystem encounters a bad block, it has now way of dealing with them. So, when the filesystem becomes large, it inevitably fails to flash.

The results for these tests are then unreliable and must be manually deleted.

Free space shortage

When the filesystem size is near from the flash size, some filesystem fail to write any more data. That kind of problems occured when writing video: That kind of data can't be compressed.

They also took a long time to fail and sometime also made the system lack of free memory. It hasn't be verified yet, but one hypothesis might be that the wear-leveling and garbage-collecting processes fail to efficiently operate under such load and need too much memory.

Non-significant time measurment

For unforseeable reasons, the time taken by an operation can be much larger than expected and/or than on another batch of tests.

Running several occurencies of the tests, remove the highest and lowest values and computing the average could be a solution.

That solution has been implemented but its implementation isn't perfect: it should first remove the best and worts results and that would need a lot of runs.

UBI not scaling as expected

UBI init time is supposed to scale linearly with the size of the device. But since the tests always use the same mtd partition and thus, the same UBI device size (no matter what the volume size may be), the init time won't reflect the size of the filesystem. A fix could be to define a different partition at each boot. The size of the test partition can be precised at each reboot but if this size is defined to be that of the filesystem, because of UBI overhead, ubifs will lack of space.