About TDFS

TDFS stands for "Trivially distributed file system", and is a proof-of-concept implementation of distributed file system as a ("stacked") layer above normal file systems. It uses the FUSE libraries and subsystem to implement this operation in userland.

NOTE:This is currently a proof-of-concept implementation, not ready for production use! I would appreciate any feedback about this project - if it works or if it doesn't work. Since I'm doing this in my spare time, it will take a long time for me to catch all bugs alone; if you need to speed this project up, consider posting ind the forums and/or submitting patches. You can contact me either personally or, better, via the SourceForge forums for the project.

NEWS:

See the SourceForge page for downloads, support forums and other information. See the provided README file for the most up-to-date information about the software!

Goals

The goal of TDFS is to solve single-writer-multiple-readers distribution of file system data (also called single-master-multiple-slave). In this scenario all writes happen (or originate) on one computer, and are propagated to others. Read requests can go to either the master or the slaves, and are served locally. Read requests don't go over the network, so this system doesn't offer strict synchronization. Some usages for this scenario are:

For example: in a scenario with one master and two slaves, the data is stored three times, once on each machine.

Immediate architecture goals

These goals are implemented:

Additional goals

The following features would be nice to have, but none of them are currently implemented.

Expected usability

Usability and performance of TDFS are constrained by what can be done with the FUSE system. In particular, there is not any particular support for file locking, and system calls such as mmap() and sendfile() could have unexpected behavior. In practice, the system should behave similar to NFS in that applications that depend on speed and file system locks should not be run on it.

Performance is influenced by the fact that this is a userland-implemented file system and thus there is a very large number of context switches made during its operation. For example, a single write() request goes like this (during kernel calls the application is sleeping):

A "normal" course of events for a kernel-implemented local file system is:

Additionally, all I/O requests in the current implementation of FUSE kernel module seem to be broken down into page-sized pieces (usually 4KB) so the userland gets and processes them in 4KB pieces, which is terribly inefficient as each piece gets separately transmitted and confirmed.

Multiple-master mode

In theory, using a setup where both master and slave daemons are configured and running on each of several machines (where the slave daemons directly use their masters' local-copy directories for writing) can result in multi-master scenario, where each of the machines can both read and write the shared file system data. This mode of operation has not been tested, but there are several obvious issues that arise:

While for some workloads these problems might be ignorable (for example, when write operations are infrequent and don't happen on the same file at the same time on different machines), this is not an adequate general solution.

Usage

Prerequisites

To compile and run TDFS, FUSE libraries and kernel module must be installed on the system being used as the master. The easiest way to do this is to install sysutils/fusefs-libs and sysutils/fusefs-kmod ports. Beware that the kernel module can be older than what is currently in the development branch of Fuse4BSD so if you start getting weird errors during operation, please try again with a fresh and current kernel module from Fuse4BSD site.

TDFS daemons are compiled with the help of provided Makefile. If you only want to build the slave daemon, run make tdfs_slave.

Usage

The TDFS system consists of two daemons, tdfs and tdfs_slave. The tdfs daemon runs on the master server and provides a mount point whose operations are mirrored over the network to tdfs_slave daemons. Among command-line arguments it supports these are the most important:

Note: there is an error in the `README` supplied with `tdfs-r1` release in which it's said that `-n` switch enables `TCP_NODELAY`. This is opposite of what the `-n` switch does in that version (TCP_NODELAY is now enabled by default and `-n` disables it).

Once the tdfs daemon is properly started, it will provide a device entry /dev/fuseX, where X is a small integer incremented every time a FUSE daemon is (re)started. When started for the first time, the device entry will be /dev/fuse0 and this is the value that will be used in examples. Note that old and inactive entries are not removed and will remain even after the tdfs daemon exits (all this is a peculiarity of FreeBSD and currently cannot be solved). This device entry must be used with mount_fusefs utility to mount it on a desired directory.

The tdfs_slave daemon is simpler to start, and the only really important arguments it accepts are -m and -n, with same meaning as in the tdfs daemon. See the message printed by -h argument for more information.

Both daemons must be started as the root user and must run on same-typed machine (e.g. i386). TDFS has currently only been tested on i386.

Examples

Here's an annotated example session with TDFS utilities:

(on slave):

# ./tdfs_slave -m /slavedata

(on master):

# kldload fuse
# ./tdfs -m /storage/data -z 1 -c slave.mynet.org
# mount_fusefs /dev/fuse0 /mnt/data
# cd /mnt/data ; do_interesting_file_system_operations ; cd -
# umount /mnt/data
(at this point the master daemon should automagically terminate; if it
doesn't, send it SIGTERM, or as a last resort, SIGKILL)

(on slave):

# killall tdfs_slave ; observe_mirrored_operations_on_/slavedata

TODO: I'm accepting suggestions on nicer interface to terminate the slave daemon :)

Both master and slave daemons can be started on the same machine, as long as this doesn't create cycles in the file system structure. It's possible that mounting FUSE device of this type into a first-level directory of a file system and exporting another first-level directory via the FUSE system will create a deadlock in the kernel module. If this happens for you, avoid using top-level directories (i.e. use /mnt/data instead of /data). This kind of lockup isn't serious and usually can be resolved by killing the process that caused the lookup (i.e. the slave), forcibly umounting the FUSE file system and killing the daemon.

Further information

TDFS is Copyright (c) 2006. Ivan Voras ivoras@gmail.com, released under the BSD license. Note that the FUSE library itself is released under LGPL so take care with binaries.

TDFS has been developed and tested under FreeBSD. Patches to port it to other operating systems will be gladly accepted, provided they don't introduce more than 5 #ifdefs into a single .c file :) (introduction of additional header files is encouraged).

Only FreeBSD 6-STABLE is currently supported.

Information about TDFS and the newest source is available at its SourceForge page.

SourceForge.net Logo