File Systems

This page has information about file systems which are of interest for embedded projects.

= Introduction = Most embedded devices use flash memory as storage media. Also, size and bootup time are very important in many consumer electronics products. Therefore, special file systems are often used with differrent features, such as enhanced compression, or the ability to execute files directly from flash.

MTD
Note that flash memory may be managed by the Memory Technology Devices (MTD) system of Linux. See the MTD/Flash FAQ for more information. Most of the filesystems mentioned here are built on top of the MTD system.

UBI
The Unsorted Block Images (UBI) system in the Linux kernel manages multiple logical volumes on a single flash device. It provides a mapping from logical blocks to physical erase blocks, via the MTD layer. UBI provides a flexible partitioning concept which allows for wear-leveling across the whole flash device.

See the UBI page or UBI FAX and Howto for more information.

Partitioning
The kernel requires at least one "root" file system, onto which other file systems can be mounted. In non-embedded systems, often only a single file system is used. However, in order to optimize limited resources (flash, RAM, processor speed, boot up time), many embedded systems break the file system into separate parts, and put each part on its own partition (often in different kinds of storage.

For example, a developer may wish to take all the read-only files of the system, and put them into a compressed, read-only file system in flash. This will consume the least amount of space on flash, at the cost of some read-time performance (for decompression).

Another configuration might have executable files stored uncompressed on flash, so that they can be executed-in-place, which saves RAM and boot-up time (with a potential small loss of performance).

For writable data, if the data does not need to be persistent, sometimes a ramdisk is used. Depending on the performance needs and the RAM limits, the file data may be compressed or not.

There is no single standard for interleaving the read-only and read-write portions of the file system. This depends heavily on the set of embedded applications used for the project.

eMMC and UFS
As flash memories have gotten larger, a variety of factors has caused a shift from use of raw NAND to packaged, block-addressable NAND flash memory for embedded devices. These are chips which contain firmware on board to accept block I/O requests, similar to rotating storage media (old hard disk drives), and fullfill them. This involves mapping the read and write requests to areas of the NAND flash in the chip, and managing the NAND flash to try to optimize for correctness and longevity of the flash memory. NAND flash must be re-written in large blocks (erase blocks) that are many times the size of individual file system blocks. Therefore, the method of mapping, re-arranging and garbage collecting the allocation of blocks in the system is quite important.

These chips are run with a block-based, rather than flash-based filesystem (e.g. ext4). As of 2012, optimizing the ext4 file system for use with these systems is a hot topic area of file system research. See http://lwn.net/Articles/502472

= Embedded Filesystems = Here are some filesystems designed for and/or commonly used in embedded devices, sorted in alphabetical order:

AXFS

 * AXFS - Advanced XIP File System
 * Website: http://axfs.sourceforge.net/
 * This file system is designed specifically to support Execute-in-place operations. It uses a bi-phased approach. The first phase is to have the filesystem in flash and run it to collect profile data, stating what pages are used. In the second phase you build a filesystem using these profile data. This filesystem makes all pages metioned in the profile file as XIP data, which can then will be loaded to RAM upon mounting (and executed as XIP). It is also possible to put the XIP pages in NOR flash and run them from there.

Btrfs

 * btrfs is a new copy-on-write filesystem that first appeared in the kernel in 2.6.29-rc1 and was merged in 2.6.30.
 * Btrfs is not yet supported by many popular Linux filesystem tools such as gparted as of April 2011.
 * Btrfs has been adopted as the MeeGo platform's filesystem.
 * Nice Introduction Video on btrfs by Chris Mason

CramFS

 * CRAMFS - A compressed read-only file system for Linux. The maximum size of CRAMFS is 256MB.
 * "Linear Cramfs" is the name of a special feature to use uncompressed file, in a linear block layout with the Cramfs file system. This is useful for storing files which can be executed in-place.  For more information on Linear Cramfs, see Application XIP

InitRAMFS
From March 2006 Linux Devices:

INTRODUCING INITRAMFS, A NEW MODEL FOR INITIAL RAM DISKS This clear, technical article introduces initramfs, a Linux 2.6 feature that enables an initial root filesystem and init program to reside in the kernel's memory cache, rather than on a ramdisk (as with initrd filesystems). Compared to initrd, intramfs can increase boot-time flexibility, memory efficiency, and simplicity, the author says. One especially interesting feature for embedded Linux developers is that relatively simple, deeply embedded systems can use initramfs as their sole filesystem.

http://www.linuxfordevices.com/c/a/Linux-For-Devices-Articles/Introducing-initramfs-a-new-model-for-initial-RAM-disks/

All of the papers by Nicholas McGuire at: http://linuxdevices.com/news/NS6374541663.html

JFFS2

 * JFFS2 - The Journalling Flash File System, version 2. This is the most commonly used flash filesystem.
 * The maximum size of JFFS2 is 128MB.
 * http://sourceforge.net/projects/mtd-mods has some patches by Alexey Korolev for improvements to JFFS2
 * See the presentation on Alexey's patches at:
 * To improve mount time substantially verify that the erase block summary patch is in your image. This patch is part of the jffs2 driver since 2005-09-07. A patch for an earlier version can be found at: http://www.inf.u-szeged.hu/jffs2/jffs2-summary-20050211.patch (or try your luck at http://web.archive.org/web/*/http://www.inf.u-szeged.hu/jffs2/mount.php).
 * JFFS2 has undergone improvement since early versions (~2.4.30). Modern versions of the driver in newer kernels have show stopping bugs fixed.

LogFS
JFFS2 for most uses, but focuses more on the large devices.
 * logfs - LogFS is a scalable flash filesystem. It is aimed to replace

Matt Mackall writes (in July of 2007):

LogFS is a filesystem designed to support large volumes on FLASH. It uses a simple copy-on-write update process to ensure consistency (the "log" in the name is a historical artifact). It's easily the most modern and scalable open-source FLASH filesystem available for Linux and it's well on its way to being accepted in the mainline tree.

Scott Preece writes:

The big win for LogFS (in my limited knowledge of it) is that it stores its tree structure in the media, rather than building it in memory at mount time. This significantly reduces both startup time and memory consumption. This becomes more important as the size of the flash device increases. Read more in LWN (http://lwn.net/Articles/234441) and linux.com (http://www.linux.com/articles/114295).

Some newer flash memory, like MLC (multi-level cell), are not well supported.

LogFS now has its own mailing list: see http://logfs.org/cgi-bin/mailman/listinfo/logfs

NFS
Due to space constraints on embedded devices, it is common during development to use a network file system for the root filesystem for the target. This allows the target to have a very large area where full-size binaries and lots of development tools can be placed during development. One drawback to this approach is that the system will need to be re-configured with local file systems (and most likely re-tested) for final product shipment, at some time during the development cycle.

An NFS client can be built into the Linux kernel, and the kernel can be configured to use NFS as the root filesystem. This requires support for networking, and mechanisms for specifying the IP address for the target, and the path to the filesystem on the NFS host. Also, the host must be configured to run an NFS server. Often, the host also provides the required address and path information to the target board by running a DHCP server.

See the the file Documentation/nfsroot.txt in the Linux kernel source for more information about mounting an NFS root filesystem with the kernel.

PRAMFS
The Persistent/Protected RAM Special Filesystem (PRAMFS) is a full-featured read/write filesystem that has been designed to work with fast I/O memory, and if the memory is non-volatile, the filesystem will be persistent. In addition, it has Execute-in-place support.
 * PRAMFS - Persistent and protected RAM File System

Info on the PRAMFS specification can be found at Pram Fs Specification

Romfs

 * RomFs - A small space-efficient read-only filesystem. A description can be found in Documentation/filesystems/romfs.txt or http://lxr.linux.no/linux/Documentation/filesystems/romfs.txt

SquashFS
Squash Fs is a (more) compressed read-only file system for Linux. This file system has better compression than JFFS2 or CRAMFS. After spending a long time outside of the mainline kernel, Squashfs have finally been merged and released with kernel 2.6.29.

It is possible to tune the amount of compression when running mksquashfs. The -b option allows you to specify the block size. A smaller block size generally gives less compression and a larger -b option gives more compression. However there is a downside to this. Data is read from the flash using blocks. So if you use a block size of 128k, and you need a page of 4k, still the compressed equivalent of 128k data will be read from flash. As 128k comprises 32 pages, it will result in 32 pages being read into the buffer cache, even though at the moment of reading you only need one. Often the other 31 pages will be needed as well, but if not you wasted some tiem to read and decompress the unused data. Also you got some unneeded data in the buffer cache (possibly the system even had to kick used pages from the cache in order to make room for these 31 pages).

If you care for the smallest filesystem you probably want to go with the largest block size. However, if your primary concern is performance you might want to experiment a little bit to see what works out best for you (and that could even be applying no compression at all! Mksquashfs has options: -noInodeCompression, -noDataCompression and –noFragmentCompression to control this). If you also applied function reordering (see Boot Time a large block size will probably work out well for you.

The table below gives an idea of the amount of compression that is achieved by the various block sizes. Input was a root filesystem of an embedded device.

A presentation on Squash FS by Phillip Lougher at ELC Europe 2008: slides and video.

UBIFS
UBIFS is a filesystem that works on top of UBI volumes


 * UBIFS presentation slides: ubifs.odp

UBIFS vs. YAFFS2 comparisons for 2.6.31.1
First, see our Flash_Filesystem_Benchmarks for more recent benchmarks.

Hardware: MIPS, 403MHz CPU, 1GB Nand Flash

IOZone results: 4M, 8M & 16M file sizes in 980MB partition.


 * mount time
 * "1st mount" : time for mounting just after "flash_eraseall".
 * "Empty" : time for mounting after "1st mount".(there's no files in partition)
 * "Full" : time for mounting after creating files util the partition is full.(file size is random.)
 * "Ubiattach" time for attaching 980MB partition into the ubi layer using ubiattach util.


 * IOZone results

Creating UBI Image
This is easiest to do, if you have access to the device and can run ubinfo and dmesg, otherwise you'll need to determine the volume size, Logical Erase Block size, etc by other means. UBI has some block overhead, which I found documentation inconsistent with my particular application, so your results may very. If your device is one UBI image for the entire NAND, this should be easier, and could probably be determined by just mounting a copy of the UBI image from the device if available.

To create the image from a rootfs you've built first you need to create the ubi.ini file, that describes your ubi image. Create a regular text file, ubi.ini, example contents, for more info run ubinize -h:

[ubi_rfs] mode=ubi image=ubifs.img vol_id=0 vol_size=87349248 vol_type=dynamic vol_name=ubi_rfs vol_alignment=1 vol_flags=autoresize

Next you'll run the commands that actually build it. Here ubi.ini is the file you just created, ubifs.img is a temp file you can delete once you are done, and your_erootfs.ubi is the name of the rootfs image that will be created.

sudo /usr/sbin/mkfs.ubifs -m 2048 -e 129024 -c 677 -r /path/to/rootfs ubifs.img sudo /usr/sbin/ubinize -o your_erootfs.ubi -p 131072 -m 2048 -s 512 -O 512 ubi.ini

To determine these and the ubi.ini file settings, use ubinfo -a and dmesg on the device if possible, which both give plenty of information about the values needed. The size and vol_name are listed under "Present volumes" when you run ubinfo -a on the device. The second half of that particular ubi device's description. While the NAND description's PEB, LEB etc are in dmesg.

mkfs.ubifs
 * -m - Minimum I/O unit size.
 * -e - Logical Erase Block (LEB) size.
 * -c - Max LEB count. (vol_size/LEB)
 * -r - Path to root filesystem.
 * ubifs.img - Temporary image file.

ubinize
 * -o - Output file.
 * -p - Physical Erase Block (PEB) size.
 * -m - Minimum I/O unit size.
 * -s - Minimum I/O size for UBI headers, eg. sub-page size.
 * -O - VID header offset from start of PEB.
 * ubi.ini - UBI image configuration file.

YAFFS2

 * YAFFS - Yet Another Flash File System - a file system designed specifically for NAND flash.

YAFFS2 is simple, portable, reliable and self-contained. It is widely used in embedded OSes other than Linux, and can also be used stand-alone without an OS, e.g. in bootloaders. When used with Linux it can use MTD or its own flash driver. Similarly it can use the VFS or its own posix layer. It is log-structured, and single-threaded. It does not do compression itself - either compress the data itself or use squashfs on top of YAFFS2.

YAFFS2 is designed to boot quickly (insofar as a log-structured FS that has to scan the flash can). It uses checkpointing so that if a partition was unmounted cleanly then there is no need to rescan the flash on power-up. All the features of the FS are configuable so you can trade off things like maximum file/partition size, flash block size, file granulaity etc. Data is written straight through to the flash except for caching to ensure efficienct use of blocks. YAFFS2 normally uses the OOB are of the flash for its metadata, allowing faster booting as only the OOB needs to be read for flash scan. It can keep its metadata inside the main page area at the expense of some speed.

Despite having been in use on Linux in real products since 2004 it has not yet made it to the mainline.


 * Presentation on YAFFS2 by Wookey at ELC Europe 2007: yaffs.pdf
 * Presentation from CELF Jamboree 17 comparing YAFFS and JFFS2 on 2.6.10: celf_flash.pdf

YAFFS2 is GPLed, but is also available under dual-licensing terms for use in non-free contexts from Aleph One Ltd.

= Mounting the root filesystem = The root filesystem is mounted by the kernel, using a kernel command line option. Other file systems are mounted from user space, usually by init scripts or an init program, using the 'mount' command.

The following are examples of command lines used for mounting a root filesystem with Linux:


 * Use the first partition on the first IDE hard drive:
 * root=/dev/hda1
 * or in later kernels:
 * root=/dev/sda1


 * Use NFS root filesystem (kernel config must support this)
 * root=/dev/nfs

(Usually you need to add some other arguments to make sure the kernel IP address gets configured, or to specify the host NFS path.)


 * Use flash device partition 2:
 * root=/dev/mtdblock2

[FIXTHIS - should probably mention initrd's here somewhere]

Mounting JFFS2 image on PC using mtdram
Since it is not possible to use the loopback device to mount JFFS2 images, mtdram needs to be used instead. Usually three modules are needed to get it working:


 * mtdram: Provides an MTD partition in RAM. The size can be defined with the total_size parameter in kilobytes.


 * mtdblock: This will create a block device for access to the partition.


 * jffs2: Since JFFS2 is usually not used as a filesystem on a PC, support needs to be loaded manually.

modprobe mtdram total_size=16384 modprobe mtdblock modprobe jffs2

Depending on the target's endianess the image file might need conversion to PC endianess. jffs2dump from the MTD tools can be used to archive this.

jffs2dump -b -c -e  

The final image can be copied to the block device using dd.

dd if= of=/dev/mtdblock0

Mounting is done in the usuall way.

mount /dev/mtdblock0 /tmp/jffs2 -t jffs2

Mounting UBI Image on PC using nandsim
First create a simulated NAND device (this one is 256MB, 2048 page size). _id_byte= corresponds to the ID bytes sent back from the NAND.

$ sudo modprobe nandsim first_id_byte=0x20 second_id_byte=0xaa third_id_byte=0x00 fourth_id_byte=0x15

Check it was created. $ cat /proc/mtd dev:   size   erasesize  name mtd0: 10000000 00020000 "NAND simulator partition 0"

Next, attach it to a mtd device. $ sudo modprobe ubi mtd=0

I had to detach it prior to formatting it. $ sudo ubidetach /dev/ubi_ctrl -m 0

If that ubidetach step fails when you enter it, just proceed to the next step to format the mtd device. $ sudo ubiformat /dev/mtd0 -f .ubi ubiformat: mtd0 (nand), size 268435456 bytes (256.0 MiB), 2048 eraseblocks of 131072 bytes (128.0 KiB), min. I/O size 2048 bytes libscan: scanning eraseblock 2047 -- 100 % complete ubiformat: 2048 eraseblocks have valid erase counter, mean value is 1 ubiformat: flashing eraseblock 455 -- 100 % complete ubiformat: formatting eraseblock 2047 -- 100 % complete

Then, attach it. $ sudo ubiattach /dev/ubi_ctrl -m 0 UBI device number 0, total 2048 LEBs (264241152 bytes, 252.0 MiB), available 0 LEBs (0 bytes), LEB size 129024 bytes (126.0 KiB)

Make a target directory, and mount the device. $ mkdir temp $ sudo mount -t ubifs ubi0 temp

= Issues with General Purpose filesystems used in embedded =

MMC/sdcard card characteristics
MMCs and SDcards are flash devices which present a block-oriented interface to their host computer. Often, these devices are used in embedded devices and have characteristics that are tuned for block access using a FAT filesystem. But they are presented at "black boxes", with internal logic and algorithms that are not exposed to the host computer.

Some work is in progress to survey characterize these attributes, and to adapt Linux to be able to use these devices more efficiently.

See https://wiki.linaro.org/WorkingGroups/KernelConsolidation/Projects/FlashCardSurvey

and https://wiki.linaro.org/WorkingGroups/KernelConsolidation/Projects/FlashDeviceMapper (These projects appear to be the work of Arnd Bergmann)

= Special-purpose Filesystems =

ABISS
The Active Block I/O Scheduling System is a file system designed to be able to provide real-time features for file system I/O activities.

See ABISS

Layered Filesystems
Layered filesystems enable you to mount read-only media and still have the possibility to write to it. At least, the writing part will end up somewhere else, which is transparantly handled by the layered filesystem. It has been around for quite some time and below are some examples of filesystems already usable on (embedded) Linux systems out-of-the-box.

UnionFS
Sometimes it is handy to be able to overlay file systems on top of each other. For example, it can be useful in embedded products to use a compressed read-only file system, mounted "underneath" a read/write file system. This give the appearance of a full read-write file system, while still retaining the space savings of the compressed file system, for those files that won't change during the life of the product.

UnionFS is a project to provide such a system (providing a "union" of multiple file systems).

See http://www.filesystems.org/project-unionfs.html

See also union mounts, which are described at http://lkml.org/lkml/2007/6/20/18 (and also in Documentation/union-mounts.txt in the kernel source tree - or will be, when this feature is merged.)

aufs
Another UnionFS. Go to http://aufs.sourceforge.net for more details.

mini_fo
minifo = mini fanout overlay file system.

Go to http://www.denx.de/wiki/Know.MiniFOHome for more details.

Apparently this is not maintained any more. Last information is from 2005.

= Performance and benchmarks =

Tools to measure performance
You can use IOZone to measure the performance of a Linux filesystem.

See http://www.iozone.org/

Some benchmark systems that are commonly used with desktop linux are
 * bonnie
 * dbench
 * Portable, fully-threaded I/O benchmark program (tiobench)
 * Flexible File System Benchmark (ffsb)

Comparison of flash filesystems
See Flash_Filesystem_Benchmarks

= Other projects =

Multi-media file systems

 * XPRESS file system - [See OLS 2006 proceedings, presentation by Joo-Young Hwang]
 * I found out at ELC 2007 that this FS project was recently suspended internally at Samsung

WikipediaFS
A mountable virtual filesystem that allows accessing mediawiki based sites as regular files using a regular editor. Currently this filesystem is unmaintained. See http://wikipediafs.sourceforge.net/ for more info.

wikifs
This one seems similar to WikipediaFS, but aimed at Plan9 and inferno. See http://www.cs.bell-labs.com/magic/man2html/4/wikifs for more info.