TDFS stands for "Trivially distributed file system", and is a proof-of-concept implementation of distributed file system as a ("stacked") layer above normal file systems. It uses the FUSE libraries and subsystem to implement this operation in userland.
NOTE:This is currently a proof-of-concept implementation, not ready for production use! I would appreciate any feedback about this project - if it works or if it doesn't work. Since I'm doing this in my spare time, it will take a long time for me to catch all bugs alone; if you need to speed this project up, consider posting ind the forums and/or submitting patches. You can contact me either personally or, better, via the SourceForge forums for the project.
NEWS:
See the SourceForge page for downloads, support forums and other information. See the provided README file for the most up-to-date information about the software!
The goal of TDFS is to solve single-writer-multiple-readers distribution of file system data (also called single-master-multiple-slave). In this scenario all writes happen (or originate) on one computer, and are propagated to others. Read requests can go to either the master or the slaves, and are served locally. Read requests don't go over the network, so this system doesn't offer strict synchronization. Some usages for this scenario are:
For example: in a scenario with one master and two slaves, the data is stored three times, once on each machine.
These goals are implemented:
The following features would be nice to have, but none of them are currently implemented.
Usability and performance of TDFS are constrained by what can be done with
the FUSE system. In particular, there is not any particular support for file
locking, and system calls such as mmap()
and sendfile()
could have
unexpected behavior. In practice, the system should behave similar to NFS in
that applications that depend on speed and file system locks should not be
run on it.
Performance is influenced by the fact that this is a userland-implemented file system and thus there is a very large number of context switches made during its operation. For example, a single write() request goes like this (during kernel calls the application is sleeping):
A "normal" course of events for a kernel-implemented local file system is:
Additionally, all I/O requests in the current implementation of FUSE kernel module seem to be broken down into page-sized pieces (usually 4KB) so the userland gets and processes them in 4KB pieces, which is terribly inefficient as each piece gets separately transmitted and confirmed.
In theory, using a setup where both master and slave daemons are configured and running on each of several machines (where the slave daemons directly use their masters' local-copy directories for writing) can result in multi-master scenario, where each of the machines can both read and write the shared file system data. This mode of operation has not been tested, but there are several obvious issues that arise:
While for some workloads these problems might be ignorable (for example, when write operations are infrequent and don't happen on the same file at the same time on different machines), this is not an adequate general solution.
To compile and run TDFS, FUSE libraries and kernel module must be installed
on the system being used as the master. The easiest way to do this is to
install sysutils/fusefs-libs
and sysutils/fusefs-kmod
ports. Beware that
the kernel module can be older than what is currently in the development
branch of Fuse4BSD so if you start getting weird errors during operation,
please try again with a fresh and current kernel module from Fuse4BSD site.
TDFS daemons are compiled with the help of provided Makefile
. If you only
want to build the slave daemon, run make tdfs_slave
.
The TDFS system consists of two daemons, tdfs
and tdfs_slave
. The tdfs
daemon runs on the master server and provides a mount point whose operations
are mirrored over the network to tdfs_slave
daemons. Among command-line
arguments it supports these are the most important:
-m <directory>
: Specify local directory that will be distributed. The
local directory will be used for all read-only operations, and all write
operations will be mirrored to slave daemons.-c <client_host>
: Add a client (slave) host to the list of slaves. At
least one slave must be specified, and slave daemons must be running before
the master is started.-z <0|1>
: Specify compression option to use. 0
(the default) means no
compression and 1
means liblzf
is used. The liblzf
brings between
50% and 100% compression with very small overhead, so in theory enabling
it could mean making a difference between a 100Mbit/s and 200Mbit/s
operation (in practice network latency will absolutely kill the throughput
in either case).-h
: Show help message for additional optionsNote: there is an error in the `README` supplied with `tdfs-r1` release in which it's said that `-n` switch enables `TCP_NODELAY`. This is opposite of what the `-n` switch does in that version (TCP_NODELAY is now enabled by default and `-n` disables it).
Once the tdfs
daemon is properly started, it will provide a device entry
/dev/fuseX
, where X
is a small integer incremented every time a FUSE
daemon is (re)started. When started for the first time, the device entry will
be /dev/fuse0
and this is the value that will be used in examples. Note
that old and inactive entries are not removed and will remain even after the
tdfs
daemon exits (all this is a peculiarity of FreeBSD and currently
cannot be solved). This device entry must be used with mount_fusefs
utility
to mount it on a desired directory.
The tdfs_slave
daemon is simpler to start, and the only really important
arguments it accepts are -m
and -n
, with same meaning as in the tdfs
daemon. See the message printed by -h
argument for more information.
Both daemons must be started as the root
user and must run on same-typed
machine (e.g. i386). TDFS has currently only been tested on i386.
Here's an annotated example session with TDFS utilities:
(on slave):
# ./tdfs_slave -m /slavedata
(on master):
# kldload fuse
# ./tdfs -m /storage/data -z 1 -c slave.mynet.org
# mount_fusefs /dev/fuse0 /mnt/data
# cd /mnt/data ; do_interesting_file_system_operations ; cd -
# umount /mnt/data
(at this point the master daemon should automagically terminate; if it
doesn't, send it SIGTERM, or as a last resort, SIGKILL)
(on slave):
# killall tdfs_slave ; observe_mirrored_operations_on_/slavedata
TODO: I'm accepting suggestions on nicer interface to terminate the slave daemon :)
Both master and slave daemons can be started on the same machine, as long as
this doesn't create cycles in the file system structure. It's possible that
mounting FUSE device of this type into a first-level directory of a
file system and exporting another first-level directory via the FUSE system
will create a deadlock in the kernel module. If this happens for you, avoid
using top-level directories (i.e. use /mnt/data instead of /data). This kind
of lockup isn't serious and usually can be resolved by killing the process
that caused the lookup (i.e. the slave), forcibly umount
ing the FUSE
file system and killing the daemon.
TDFS is Copyright (c) 2006. Ivan Voras ivoras@gmail.com, released under the BSD license. Note that the FUSE library itself is released under LGPL so take care with binaries.
TDFS has been developed and tested under FreeBSD. Patches to port it to other
operating systems will be gladly accepted, provided they don't introduce more
than 5 #ifdef
s into a single .c
file :) (introduction of additional
header files is encouraged).
Only FreeBSD 6-STABLE is currently supported.
Information about TDFS and the newest source is available at its SourceForge page.