DM14 Required Assignment 3: Log-Structured File Systems
In the future RAM may become so cheap that the entire contents of hard drives is kept in main memory such that only writes requires actually transferring data to the hard drive; after the initial read of the contents of the hard drive, everything else is done using only main memory.
Looking at the evolution of hard drives the only thing for which there has been almost no improvement is the disk seek time, i.e., the time required to reposition the hard disk head. Data transfer rates have and will probably continue to increase during the next many years.
A natural consequence of the above is to look at the entire hard drive as a circular log. All writes are initially buffered in memory. Later, either periodically or when enough data is available, all the buffered data is written to the disk in a single segment at the end of the log.
For more details, you can have a look at a few pages in the following book. You can get a copy of these pages in the DM14 paper slot at IMADA opposite the notice board and next to place where you can find the grades for written exams.
Andrew S. Tanenbaum and Albert S. Woodhull. Operating Systems: Design and Implementation. Prentice Hall, 1997. Section 5.3.6, pages 432-4.
In this assignment, your task is to implement the log-structured file system described by Tanenbaum and Woodhull. By doing this you will hopefully learn a lot about how file systems are implemented.
Requirements and Hints for Design and Implementation
Even though the entire philosophy of LFS is rooted in the idea of having the entire contents of the file system in main memory, you do not need to implement it like this. In general your main focus should be on actually making your file system work, rather than optimizing it for efficiency.
Your file system should implement the following parts of a file system using some appropriate API.
- Directories
- List, create, and delete files. Create and delete directories. Omit rename and link. You are allowed to restrict the length of filenames, if this would make the implementation easier.
- Files
- Open, close. Sequential read/write of streams of bytes.
- Inodes
- Size, access and modification time stamps. Omit owner, permissions, reference count, etc. You only have to implement one level of indirect data pointers.
To ease the implementation we will not use User-mode Linux as for the previous two assignments. Instead, you should implement LFS as a normal user program using either C or Java depending on what you prefer. The actual file system should be implemented using a normal semi-huge file in the file system of the computer you use. Everything must be able to run on the Linux machines at IMADA.
In the LFS implementation of Tanenbaum and Woodhull they use a cleaner thread which runs periodically. If necessary, you do not need to use a special thread. Instead you can do the cleaning in connection with file system operations.
Note that in the description above and in the pages from Tanenbaum and Woodhull, a lot of details are left out. You have to decide for yourself what is appropriate and what to do.
Report
The report must be short and precise (maximum 10 pages plus source code). Remember that you must demonstrate that you have understood the problems and solutions.
It must include the following:
- A description of what you have done and the choices you have made.
- Discuss how your LFS implementation could recover from a power failure and how you would guarantee that recovery is always possible, even if yet a power failure happens while you are recovering from the previous failure. Note: You do not have to implement this.
- A description of the tests you have made and the motivation for these. The tests must be able to verify that your solution works correctly, but you do not have to test all possible error scenarios. Remember it is better to document a bug rather than just ignoring it.
Further References
If you want more information, you can have a look at the following. Note: this is not necessary, if you only want to do the assignment.
- FUSE (Filesystem in USErspace) is an easy way to create a file system running in userspace for Linux.
- Mendel Rosenblum and John K. Ousterhout. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems volume 10:1, February 1992, pages 26 - 52, February 1992. The paper is available from the ACM digital library. Note: to get the actual PDF-file may require you to be at the university. The paper describes LFS in more detail.
- JFFS2: The Journalling Flash File System, version 2 is a variant of LFS used for flash devices. This file system is included in any newer standard Linux kernel.
Frequently Asked Questions (FAQ)
- Are we allowed to work in groups?
- For everything except the report, you are allowed to work in groups of at most three people. Write the members of your group on the front page of your report. The report you have to write on your own. Remember that the three required assignments are part of your final exam in June, i.e., you have to fully understand all parts of what you have done.
- How do I actually do this? It seems very complicated
- Have a look at some of the links at the top of this page. You can also try to send an email to the DM14 mailing list; maybe some other group has been able to solve your particular problem.