Content uploaded by Francisco Brasileiro
Author content
All content in this area was uploaded by Francisco Brasileiro on May 04, 2016
Content may be subject to copyright.
BeeFS: A Cheaper and Naturally Scalable
Distributed File System for Corporate Environments
Thiago Emmanuel Pereira Jonhnny Weslley Silva Alexandro Soares Francisco Brasileiro
{thiagoepdc,jonhnny,alexandro}@lsd.ufcg.edu.br, fubica@dsc.ufcg.edu.br
Federal University of Campina Grande
Distributed Systems Lab
Av. Apr´
ıgio Veloso, s/n - Bodocong´
o
58.429-900, Campina Grande - PB, Brazil
Abstract
BeeFS is a distributed file system that harnesses the free
disk space of desktop machines already deployed in the cor-
poration. Like some special-purpose rack-aware file sys-
tems, it uses a hybrid architecture that follows a client-
server approach for serving metadata and manage file repli-
cas, and a peer-to-peer one for serving data. This char-
acteristic allows BeeFS to aggregate the spare space of
desktop disks to build a single logical volume on top of
which a general purpose fully POSIX-compliant file sys-
tem is implemented. BeeFS is not only more efficient than
the state-of-practice that uses a dedicated server approach
(eg. NFS), but also cheaper and naturally scalable. Exper-
imental benchmarking results show that, in average, BeeFS
outperforms NFS execution time in 74% for write opera-
tions and 30% for read operations in the best case. In the
worst case, BeeFS results in a 56% improvement in write
operations and 20% for read operations when compared
with NFS. Moreover, in all cases, BeeFS improves in at
least 30% all metadata operations. Reduced ownership cost
is achieved by increasing the utilisation of desktop disks,
while sudden increases in the demand for storage, normally
caused by the arrival on new users, is usually matched by
the extra disk space that is available in the machines allo-
cated to the new users.
1. Introduction
File system services for desktop users have been com-
monly provided by either a local file system such as
ext3 [22] and NTFS [18] or a distributed file system such
as NFS [15] and Coda [19]. Local file systems cater well
for the needs of individual users, but are less suitable for
addressing the needs of corporations, since they are not fit
to handle file sharing and user mobility1in a seamless way.
On the other hand, distributed file systems offer a more ad-
equate solution and have been widely adopted in the corpo-
rate world. However, state-of-the-practice distributed file
systems also have some drawbacks. NFS uses a client-
server architecture which inherently limits its scalability
and availability; more importantly, dealing with a sudden
increase on the capacity demand of the file system – nor-
mally triggered by the arrival of new users – is both costly
and cumbersome, since it involves the acquisition of more
disk, down time for data migration, and possibly changes in
the administration activities – for instance, to accommodate
new backup procedures. On the other hand, Coda allows
multiple servers to share the load, and a simplified proce-
dure for capacity growth; nevertheless, this comes with a
substantial increase on the administrative cost of maintain-
ing the system and no reduction on the cost of growing the
system storage capacity.
This paper presents BeeFS, a distributed file system
which focuses on the specific setting of corporate desktop
networks. It provides a global file namespace and location-
transparent access to files which are stored in the disks of
the participating machines. Such approach for storage is
desirable and made feasible since the capacity of modern
hard disks has outgrown the needs of many users, thus leav-
ing them with much idle storage space in their desktops [7].
Moreover, a notably fraction of corporate desktops shows
low CPU and disk loads [6].
BeeFS is designed following a hybrid architecture that
employs a centralised server to store metadata and man-
age file replication – facilitating system design and admin-
istration – and distributed data storage servers that collab-
oratively store data, reducing the bottleneck on the central
1User mobility in this case relates to the possibility of a user at some
time to use one desktop to access the system and at another time, use an-
other desktop.
server, and allowing the incremental growth of the storage
capacity. Data storage servers may be deployed on every
desktop machine of the corporation to harness the spare disk
space in these machines, increasing the utilisation of these
disks at no noticeable extra cost. Access to files in this dis-
tributed file system is mediated by a client component that
may also run at each desktop. Since the desktops may run
both a client and a server component, this part of the system
resembles a peer-to-peer system. A data placement strategy
tries to keep data as close as possible to the clients that ac-
cess them, allowing the scalable growth of the file system.
We have implemented BeeFS and evaluated its perfor-
mance using standard file system benchmarks, covering a
range of typical file system operations for corporative work-
loads. As said before, a data placement strategy tries to keep
data as close as possible to the clients that access them. To
better understand the BeeFS performance capabilities, two
experimental scenarios were set up: in the first one, all data
is stored in the same machine where the client executes,
while in the second scenario all data has to be fetched over
the network. In order to qualify these results, the same ex-
perimental load was applied to NFS, which currently repre-
sents the state-of-practice solution for corporate distributed
file system. The evaluation comparison shows that BeeFS
outperforms NFS execution time for write, read and meta-
data operations for both experimental scenarios. Therefore,
the qualitative gains of BeeFS come with no performance
penalty; on the contrary, performance is actually increased.
The remainder of the paper is organised as follows. Work
related to ours is surveyed in the next section. Then, in Sec-
tion 3, the design of BeeFS is presented. In this section it
becomes evident that judicious placement of the file data
in the data storage servers is a crucial performance enabler
factor in BeeFS. This issue is carefully studied in Section
4. This is followed by the description of the experiments
and analysis that have been conducted to compare the per-
formance of BeeFS against that of NFS. Finally, Section 6
concludes the paper with our final remarks.
2. Related work
Farsite [4] builds a collaborative, unified file system
over a set of non-trusted desktops. It provides file pri-
vacy and reliability through cryptography and replication,
respectively, and uses file caching to improve system’s per-
formance. OceanStore [12] aggregates a network of non-
trusted servers to make possible both global and persistent
storage. It also uses cryptographic and redundancy tech-
niques to provide private deep storage for archival purposes.
The lack of trust in these systems implies on the use of
costly techniques such as Byzantine agreements, large repli-
cation groups and cryptography. These techniques are un-
necessary or adopted in a soft fashion (in the case of repli-
cation) in our approach, because it is designed to a stricter
environment, namely a set of trusted LAN-connected en-
terprise desktops. BeeFS takes advantage of the fact that,
in the corporate environment, it is possible to consider that
peers are trustworthy, since they are centrally managed by a
single authority. Moreover, in such a setting, the probability
of a node being connected to the system can be biased by
administrative rules, a property that is absent in most peer-
to-peer systems, in which peers are anonymous, and high
levels of churn are the norm. Furthermore, when dealing
only with well-known peers, it is plausible to assume that
they rarely leave permanently the system in an unannounced
manner, allowing easy differentiation between temporary
and permanent failures.
AFS [8] is a distributed file system composed by a set of
trusted servers, called Vice. AFS presents a homogeneous,
location-transparent file namespace to all the clients. The
operating system on each client machine intercepts file sys-
tem calls and forwards them to a user-level process on that
workstation. This process, called Venus, caches files from
Vice and stores modified copies of files back on the servers
they came from. Venus contacts Vice only when a file is
opened or closed; reading and writing of individual bytes of
a file are performed directly on the cached copy and bypass
Venus. This file system architecture was motivated primar-
ily by considerations of scale. To maximise the number of
clients that can be supported by a server, as much of the
work as possible is performed by Venus rather than by Vice.
Only functions essential to the integrity, availability or secu-
rity of the file system are retained in Vice. AFS implements
a callback based cache coherence process; a server remem-
bers what files have been cached by a client, and notifies it
when another client tries to update one of those files. This
eliminates the need for clients to validate cached files before
using them. The whole-file caching plus the callback based
cache coherence are the basis for the performance and scal-
ability gains in AFS. Considering the resources available
in 1991, AFS supported the ratio of 200 clients for every
server machine [10], while NFS provided only a 20 : 1 ra-
tio.
Bayou [16] and Coda [19] – an evolution of AFS – are
file systems that are also based on multiple servers. They
use replication to improve availability at the expenses of
relaxing consistency. This brings the necessity to intro-
duce specialised conflict resolution procedures to address
issues caused by write operations executed over stale data.
Sprite [14] also uses replication and caching to improve
availability and performance, but has a guarantee of con-
sistency. This, in turn comes with a performance penalty in
the presence of multiple writers.
All these distributed file systems use a set of storage
elements in order to balance the service load. In spite of
solving the inherent scalability bottleneck of simpler client-
server approaches such as NFS, they increase administra-
tive effort by multiplying the number of servers that require
special attention on tasks such as directory exportation, disk
partitioning, quota assignment, backup, deployment of soft-
ware patches, efficient cooling capacity, service monitoring
etc. Our approach is to use a hybrid architecture that mixes
peer-to-peer characteristics to achieve scalability and a sin-
gle server to ease administration and access to key data (eg.
file metadata). A judicious data placement strategy favours
the storage of data accessed by clients at the data storage
server located in the machine where the client runs, mini-
mizing remote data access and diminishing the dependency
of caching strategies to achieve suitable performance. In
particular, as we will show shortly, BeeFS performs well
without the need of implementing a caching layer, confer-
ring to it a much stronger consistency guarantee, when com-
pared to other approaches.
Google File System (GFS) [9], Hadoop Distributed File
System (HDFS) [2] and Freeloader [23] are distributed file
system designed for data-intensive applications. GFS and
HDFS provide efficient, reliable access to data using large
clusters of commodity hardware. Freeloader harnesses the
unused disk space of LAN-connected desktops to build a
site-wide unified temporary storage space exploited by data-
intensive scientific applications. All these systems are de-
signed for specific workloads; GFS and HDFS applications
usually perform append operations (rather than perform ran-
dom writes), while Freeloader has an even simpler use case
- it only supports immutable data sets. BeeFS uses an ar-
chitecture that is similar to these systems, but provides sup-
port to the POSIX interface, so to support a wider range
of applications, including those that want to profit from the
data-intensive processing features of a data grid.
Table 1 summarises the above discussion, indicating the
features provided by the several solutions discussed.
3. Architecture
3.1. Components Overview
A BeeFS installation consists of a single queen-bee
server that handles naming, metadata and replica manage-
ment operations and a number of honeycomb servers that
store the actual files. The queen-bee and the honeycombs
provide service to many honeybee clients as shown in Fig-
ure 1. These components are arranged following a hybrid
architecture that mixes aspects of client-server and peer-to-
peer systems in a fashion that simplifies the design and fa-
cilitates the administration of the system.
The queen-bee server is deployed in a dedicated ma-
chine. It is responsible for providing a global file namespace
with location-transparent access for files, access control, re-
source discovery and placement coordination services. On
Figure 1. BeeFS Components Overview
the other hand, it is not involved in data storage at all. Hon-
eybee clients contact it in order to obtain the location of
the honeycomb servers that store the files. After that, they
fetch/send data directly from/to the appropriate honeycomb
server.
The role of the honeycomb servers is to collaboratively
store files, providing basic read and write primitives. Hon-
eycomb servers are conceived to be deployed over a set of
desktop machines belonging to the corporate LAN.
The honeybee client component normally coexists with
a honeycomb server that runs on the same machine. A data
placement strategy tries to keep data as close as possible
from its users, allowing the scalable growth of the file sys-
tem.
3.2. Data Operations
BeeFS does not rely on client side data caching. Caching
is only performed at the honeycomb servers. As a result,
there will be a single copy of a data file in the system at any
time, avoiding the situation when concurrent writes update
different copies of the data. This allows a strong consistency
model - one-copy-serializable [21]. In this model, any se-
quence of data operations has a result that is equivalent to
the execution of the same sequence in a local system. The
non-existence of caching at the file system level can impose
severe performance penalties. This can only be mitigated if
a suitable data placement mechanism forces that only a tiny
fraction of data access is performed in remote honeycomb
servers. In the next section we discuss strategies that can be
used to accomplish this goal.
A system workload measurement [17] has demonstrated
that 98% of the stat system calls issued in general purpose
Table 1. File system comparison
Feature NFS
AFS Farsite GFS
BeeFSCoda OceanStore HDFS
Bayou FreeLoader
POSIX compliant interface yes yes no no yes
No server bottleneck no yes yes yes yes
Locality transparency yes yes yes yes yes
Easy to add volumes no no yes yes yes
One-copy semantic no - no no yes
Simple administration yes no yes - yes
file systems are followed by another stat system call. Be-
cause of that, metadata caching is performed by the BeeFS
at the honeybee client side. In particular, the fetching of the
metadata associated to a directory causes the caching of the
metadata of all the files that appear in the first level of this
directory. Flushing of the metadata cache is done whenever
operations that modify the file (eg. write,truncate,rmdir
etc) are called or when a timeout is reached. This meta-
data caching approach is similar to the one implemented on
NFS [15].
3.3. Fault Tolerance
For tolerating faults of a honeycomb server, BeeFS em-
ploys a non-blocking primary-backup replication model.
Following this model summarised on Figure 2, a honey-
bee client always performs data operations in the primary
honeycomb server and eventually the data updates are for-
warded to the secondary ones [5]. Hence, the primary copy
is always consistent, while the secondary ones are only
eventually. There are two main reasons to apply this repli-
cation model in the case of file systems: i) the services op-
erations are not delayed by waiting the data commit to sec-
ondary honeycomb servers; and, ii) most data blocks have a
short life time [17], so there is a high probability of unnec-
essary commits.
The queen-bee server is responsible for orchestrating the
update of secondary replicas. It keeps a view of the state
of the replica groups associated to each file in the system.
This view contains the version of each replica stored in the
honeycomb servers. Following the POSIX semantic, the
honeybee client sends a close call to the queen-bee server
which updates its view about the replicas. In the end of the
close call, the queen-bee server schedules a process to prop-
agate the content from the primary replica to the secondary
ones. The time that this propagation takes place is defined
by a configurable coherence parameter.
The queen-bee server is responsible for monitoring the
honeycomb servers. When a honeycomb server failure is
detected, the replication group must be reorganised. A new
Figure 2. BeeFS Replication Model
honeycomb server must replace the faulty one on the repli-
cation group, in order to keep the desired replication level.
If the faulty honeycomb server was a primary one, the repli-
cation group can be recomposed only if there is at least one
secondary honeycomb server whose content is consistent
with that of the faulty primary. The access to staled repli-
cas is always denied, until an updated primary replica is
brought back in operation, possibly from a previous consis-
tent backup.
Regarding the queen-bee server, we distinguish two
types of faults. For transient faults, BeeFS assumes a crash-
recovery failure model. A transient failure of the queen-
bee server will impact all operations that have changed file
metadata and that had not been made permanent in the
queen-bee server disk. Like most file systems (both cen-
tralised and distributed), BeeFS periodically flushes data to
disk in order to reduce this window of vulnerability. Upon
recovery, a consistency check is applied to identify incon-
sistencies and restart operation after these are healed. The
redundant copies of the file attributes (one in the queen-bee
server disk and another in the honeycomb server that stores
its primary replica) are used to identify inconsistencies.
Differently from other file systems, BeeFS leverages on
the redundant storage of file attributes to also recover from
permanent queen-bee server failures (eg. due to a crash in
the queen-bee server disk). In this case, a fresh instance
of the queen-bee server is started in recovery mode. This
service receives as input the list of honeycomb servers and
contact them in order to rebuild its state [20].
3.4. File Attributes
The metadata stored by the queen-bee server includes at-
tributes related to the files stored by the honeycomb servers.
They are subdivided in common and extended. Common
attributes keep basic information defined in the POSIX
standard, such as size, owner, group, access permissions,
change and access times, and so on. Extended attributes al-
low BeeFS to store additional information in the metadata
associated to a file. Such information is useful for some
applications as well as for BeeFS itself.
BeeFS makes use of extended attributes to keep infor-
mation about the replication level of each individual file.
Also, once modifications over primary data files are not im-
mediately propagated to its replicas, the concept of version
was introduced in order to control replica consistency. A
file is said to be consistent if all its replicas share the same
version number. The time elapsed since a modification is
made into a file until the propagation process is triggered, is
called time-to-coherence and is also stored as an extended
attribute. In addition, a type descriptor is stored to distin-
guish between primary and secondary copies. For fault tol-
erance reasons, whenever the extended attributes of a file
are changed, this information is sent to the primary replica
to be redundantly stored. Note that the other attributes are
already redundantly stored by the underlying local file sys-
tem used by the honeycomb servers.
The POSIX file attribute operations are implemented by
the queen-bee server. The amount of storage space required
to maintain file attributes is remarkably small, as suggested
by a study on file system contents [7]. Taking into con-
sideration that a typical user stores about 6,000 files and
that 100 bytes per file are enough for storing attributes, only
600 Kbytes are required per user. This way, the queen-bee
server adopts a policy similar to GFS, keeping this informa-
tion stored in-memory and periodically commiting updates
to the disk for fault-tolerance reasons.
3.5. Security
BeeFS scatters files over the corporation desktops. The
files stored on these machines must be protected from inside
abusers. Both privacy and integrity may be compromised by
such malicious users.
Farsite [4], another filesystem which harnesses spare
disk space in networks of corporate desktops, solves these
problems by using cryptography. BeeFS avoids this tech-
nique due to performance reasons. Instead, it leverages on
the access control mechanism (eg. Access Control Lists,
etc) of the underlining local file system that is used by the
honeycomb server. This way, privacy and integrity can be
easily guaranteed by assigning a permission mode to every
file stored on a honeycomb server, in such a way that only
the BeeFS components can access them.
4. Data Placement
Data placement strategies play a major role on the per-
formance of a file system that stores data in desktops, as
it is the case of BeeFS. Unbalanced loads caused by poor
data allocation decisions have immediate negative impact
on both the latency of the file system operations, as well
on the user’s experience at the desktop that stores remote
data. In this scenario, an ideal data placement strategy must
explore locality as much as possible.
We have designed a placement strategy that tries to re-
duce the number of remote access to files. It works as fol-
lows: whenever a file is created, the primary copy is al-
located to the honeycomb server located on the machine
where the honeybee client runs. If the local honeycomb
server does not have space to store the file, then a reloca-
tion procedure tries to free space, allowing the new file to
be allocated to the local honeycomb server (this relocation
procedure is also executed when a write/append operation
is performed and the free space left is not enough to accom-
modate the operation demand). If there is no honeycomb
server collocated, then the remote honeycomb server with
the largest amount of free space is chosen.
The reasoning for this strategy is the following. It is
known that file sharing is a rare event on the typical work-
load of a corporate file system [13]; moreover, users fre-
quently access the file system using the same desktop.
Therefore, it is expected that the proposed placement strat-
egy will lead to an infrequent number of remote accesses.
In fact, our results show that this strategy is effective even
when users frequently change the desktop from which they
access the system.
4.1. Simulation Model
We have evaluated the effectiveness of this placement
strategy through simulations. We have used an 8-month
trace2of the file system accesses in a corporate deployment,
over a hypothetical environment composed of 50 homoge-
neous desktops. We considered that all desktops had a 10
Gbytes spare disk capacity that was used by the BeeFS hon-
eycomb server. At the beginning of the simulation, honey-
comb servers are filled with data that consumes half of their
2http://iotta.snia.org/
capacity. This distribution of space usage is in accordance
with an earlier study [7]. However, in order to understand
how the placement strategy works in a more critical scenario
when disks are full, we have selected 5honeycomb servers
(i.e. 10% of them) to be completely filled with files. In or-
der to fill desktop’s disks, primary and secondary replicas
of files are evenly distributed on the 50 honeycomb servers.
This distribution continues until the first 45 data servers
reach at least half of their capacity. After that, a second
round of the file distribution procedure starts on the remain-
ing 5machines and continues until they have been filled.
At this point, around 1,650,000 (primary and secondary)
replicas are stored in the BeeFS file system, for a total of
approximately 275 Gbytes stored. The size of the files used
to fill in the disks follow a log-normal distribution with pa-
rameters mean = 8.46 and variance = 2.38, as suggested
by the characterisation study described in [7]. Each file has
three replicas (one primary and two secondary).
After the file system is populated, the simulator starts
to process the trace. The trace has been filtered, such that
any operation to a file that is not preceded by an open or
a create call is purged. Also, if the first reference to a file
is not a create, but an open3, then the simulator forces the
creation of the file and processes the open accordingly.
The trace used has the information about the file accesses
performed at a single machine, which we map to the be-
haviour of a single user. The simulator starts processing
the trace assuming that the user is logged in one of the ma-
chines of the corporation (called its base machine) and, af-
ter a period of time4, the user may change machine with a
given probability p. In other words, starting from the sec-
ond login in the trace, at each login session the user has a
probability 1−pof logging in its base machine and pof
logging in a different machine. In each simulation, the base
machine is randomly chosen among the 5machines whose
honeycomb servers are full, while the machines to which a
user can temporarily migrate are randomly chosen from any
of the other 49 machines in the network.
We have considered two possibilities for the relocation
procedure that is performed when the data storage server is
full. Both start in the same way, by removing secondary
replicas. If after removing all secondary replicas, more
space is needed, then a second phase starts using the fol-
lowing heuristics:
•Random, which removes primary replicas randomly;
and
3Files can be created using an open call with suitable parameters; to
make the reading more fluid, when we refer to create calls we mean both
create an open calls that would create a file, while when we refer to open
calls we mean only the open calls that do not attempt to create a file.
4We have used the trace usage pattern to infer a log-off/login behaviour.
After an inactivity period larger than 1 hour the user is supposed to start a
new login session.
•WorkingSet, which removes primary replicas using an
LRU policy.
It is important to note that for dependability reasons, pri-
mary replica removal is only possible if there is a secondary
replica up-to-date, this way the primary is removed and a
secondary is promoted to primary. After a successful replica
removal (both primary and secondary cases), a new replica
eventually will be created to keep the desired replication
level.
4.2. Results and Analysis
Figure 3 gives the cumulative percentage of the number
of local read accesses for three different values of the mi-
gration probability (p={0.2,0.5,0.8}) for both Random
and WorkingSet procedures.
(a) Random Relocation
(b) WorkingSet Relocation
Figure 3. Cumulative Percentage of the num-
ber of local read accesses
From the plot in Figures 3(a) and 3(b) we can see that, for
both procedures, even in the worst case scenario in which
the honeycomb server of the base machine is full and the
migration probability is very high (80%), no less than 75%
of the file accesses are collocated.
A result that might look surprising is that the migration
probability does not have a strong impact on the collocation
hit ratio. This is explained by the fact that most of the ac-
cesses to a given file are made in the context of a short ses-
sion time – the time elapsed between the first and last open
on a given file. On a more detailed scrutiny of the simula-
tion trace we discovered that the session time of most files
(86%) are not larger than one millisecond. This result is in
conformance with previous workload characterisation stud-
ies [13]. Therefore, the impact of the migration probability
is low in the simulated workload because the probability of
accessing a file created before the migration event is very
low.
A visual inspection of local read access percentage box
plots plotted on figures 4(a), 4(b) and 4(c), gives a hint
about the similarities of random as workingset heuristics.
A more accurate approach, using a t-test with a 90% con-
fidence interval points that the two heuristics are not statis-
tically different.
Our hypothesis to explain the similarities on relocation
algorithms is based on the proper dynamics of replication-
based data placement. It is natural to expect that a honey-
comb server stores both secondaries and primary replicas,
in doing so, it is reasonable to conclude that the random
and workingset heuristics run in the same way, in many
cases. Remember that execution difference between reloca-
tion heuristics only happens when there are no secondaries
replicas to exclude.
A direct consequence of this results is that the random
heuristic is the better choice to implement, due to its sim-
plicity and comparable results.
5. Performance Evaluation
In this section we analyse the performance of BeeFS,
comparing it against NFS. NFS represents the state-of-the-
practice for file system service in corporate environments,
and is, therefore, a suitable choice for comparison. How-
ever, it is important to note that BeeFS advantages, when
compared to NFS and also to other file system solutions
for corporate environments, go far beyond any performance
gain that it may bring. These advantages have already been
discussed in the previous sections and the performance eval-
uation reported in this section aims at showing that they
come with no penalty in performance. In fact, our results
show that performance gains constitute an extra advantage
of BeeFS when compared to NFS.
We have developed an implementation of BeeFS that
runs on Linux machines. As discussed before, BeeFS ex-
poses the POSIX API for file system service; this is spe-
cially important for reasons of applications compatibility.
(a) Migration Probability = 0.2
(b) Migration Probability = 0.5
(c) Migration Probability = 0.8
Figure 4. Relocation Procedure Comparison
Programming a POSIX file system service on Linux, usu-
ally requires coding at the VFS (Virtual File System) level.
Instead, we have implemented BeeFS at the user level us-
ing the Java programming language. The coupling between
the user level application and the Linux kernel file system
modules was done via FUSE [1].
Our implementation has been checked against the ver-
sion of Pawel Jakub Dawidek’s POSIX file system test suite
maintained by Tuxera [3] and has successfully executed all
the 3,061 tests that comprise the suite, giving us confidence
that it is, indeed, fully POSIX-compliant.
In the following, we describe the benchmark used to run
the performance tests, present the results attained, and anal-
yse these results.
5.1. The Andrew Benchmark
In order to measure the file system performance in a wide
range of typical operations, we ran the well-known Andrew
benchmark [11]. This benchmark emulates a software de-
velopment workload. Although most users do not compile
programs frequently, the Andrew benchmark can be viewed
as a good match for a generic small-file operations work-
load, which represents the typical file system demand of
corporate users. The benchmark receives as input the root of
a source tree located on a file system different from the one
being tested. It is composed by five phases: mkdir, copy,
stat, grep, and compile.
1. The Mkdir phase creates the directory hierarchy of the
original source tree on the system being tested;
2. The Copy phase copies all files from the original
source tree into the created directory hierarchy;
3. The Stat phase fetches the status of all the files in the
tree without examining their data;
4. The Grep phase reads the whole content of all files
created during the copy phase;
5. The Compile phase compiles and links the files.
We ran the Andrew benchmark using as input the Na-
gios 3.0.6 source tree (available at http://www.nagios.org/),
which contains 21 directories, 515 files and roughly 9
Mbytes of data. As discussed in the previous section, BeeFS
data placement strategy leads to most of the operations of
the benchmark, if not all, to be executed in a collocated way.
To have an idea of worst case performance, we have artifi-
cially set up scenarios in which all file data has to be fetched
over the network. This behaviour can be forced by simply
running the honeybee client component in a desktop that
does not run a honeycomb server. In the following we con-
sider two configurations for the experiments with BeeFS.
The first configuration deploys the honeybee client and the
honeycomb server on the same machine, while in the sec-
ond, these components are deployed on different machines.
On both configurations, the queen-bee server is deployed
alone in another machine. In addition, BeeFS honeybee
clients used metadata caching so that most system calls in-
volving file attributes, e.g. getattr and getdir, are answered
without contacting the queen-bee server.
The third configuration uses NFS (version 3). The NFS
configuration consists of an NFS server and a standard NFS
client implementation, which was utilised with the default
options. The most relevant of these options for the bench-
mark are: TCP transport, 8192-byte read and write buffers,
asynchronous client writes and attribute caching allowed.
In all experiments, the NFS server and the queen-bee server
were run in the same machine. Likewise, the clients in
both classes of experiments (NFS and BeeFS) were run in
the same machines. Henceforth, these configurations are
named as follow:
•Collocated, which consists of a queen-bee server run-
ning on a machine, and a honeybee client and a honey-
comb server collocated on another machine;
•Non-collocated, which consists of a queen-bee server,
a honeybee client and a honeycomb server, all running
on different machines;
•NFS, which consists of an NFS server running on a
machine and an NFS client running on another.
All of the performance measurements were made on 3.2
GHz Intel machines with two cores and 2Gbytes of RAM
each, running Ubuntu 9.04 kernel 2.6.28 −11-vserver. All
nodes were connected via a 100 Mbit Ethernet network that
was isolated from heavy traffic during the execution of the
experiments5.
A sufficient number of experiments were performed to
achieve a 95% confidence level on the mean value of the
performance metrics measured, with an error not larger than
5%. Table 2 shows the results of the benchmark execution
for the the three configurations described.
Table 2 also shows the perceptual improvement in the
execution time for both BeeFS Collocated and BeeFS Non-
Collocated in relation to NFS. For all BeeFS configurations,
both Mkdir and Stat phases improve by similar amounts -
— about 40% for Mkdir and 26% for Stat —, because these
phases are only one stream of synchronous requests to the
queen-bee server.
5Some distributed services such as NTP and IMAP were running dur-
ing the execution of the experiment, however, the network traffic generated
by these services is not substantial and does not significantly impact the re-
sults of the experiment.
Table 2. Mean execution times (in ms) for in-
dividual benchmark phases for each config-
uration: NFS, BeeFS Collocated and BeeFS
Non-Collocated.
Phase NFS BeeFS
Collocated Non-collocated
Mkdir 29.4 17.8 (40%) 18.4 (37%)
Copy 9,118.5 2,362.3 (74%) 3,954.8 (56%)
Stat 2,434.4 1,794.9 (26%) 1,712.9 (29%)
Grep 3,073.2 2,190.6 (29%) 2,481.5 (20%)
Compile 43,075.0 38,537.3 (10%) 43,472.5 ( -1%)
On the other hand, the disk-intensive phases (Copy and
Grep) improve by different amounts. This behaviour re-
flects the benefits of collocation. This way, while collo-
cated machines realise I/O operations directly in the hard
disk, non-collocated machines and NFS transfer data blocks
through the network. Thus, BeeFS used with a collocated
configuration gets better performance and still saves net-
work bandwidth.
The CPU-intensive Compile phase strongly dominates
total execution time, so that it shows an improvement of
10% to Collocated configuration. On the other hand, the
Non-Collocated configuration is practically similar to NFS,
showing a slow down in its execution time by 1% in this
case.
All the percentage improvements were statistically con-
firmed by a t-test with a 90% confidence interval. In spite
of the 1% slow down, the NFS and BeeFS Non-Collocated
execution times on Compile phase are equivalent using the
same t-test with a 90% confidence interval.
5.2. Scalability evaluation
The scalability properties of BeeFS were evaluated by
measuring the effects of an increasing service load on the
execution time of the Andrew benchmark. We have set up
an experimental scenario in which a number of clients de-
ployed on several machines simultaneously run the bench-
mark on different sub-trees of the file system namespace.
This experiment used the configurations NFS and Collo-
cated in the same machines described in the previous sec-
tion.
Figure 5 illustrates the comparison of mean execution
time of the Andrew benchmark, as seen by the clients, for
both NFS and Collocated as a function of the number of
concurrent clients. The NFS results show a higher growth
than the BeeFS ones. This result confirms the assump-
tion that a hybrid architecture that uses distributed storage
servers reduces the bottleneck on the central server. As a
result, it improves performance at the client side. For ex-
ample, when exposed to 20 concurrent honeybee clients,
BeeFS outperforms the same configuration for the NFS case
in approximately 60%. Assuming that both NFS and BeeFS
behaviour can be predict by a linear model based on this ex-
periment, a 50-client extrapolation of this experimental set-
ting for an NFS environment will perform comparably to a
225-client BeeFS setting. In other words, when compared
to NFS, BeeFS can deliver the same performance even if
exposed to a load that is 4times higher.
Figure 5. Means of total execution time (in
seconds) for NFS and BeeFS Collocated as
function of the number of clients.
6. Conclusion
We have presented a distributed file system, called
BeeFS, that harnesses the free disk space of desktop ma-
chines already deployed in a corporation to build a cheap,
efficient and scalable storage solution. The reduction on the
total cost of ownership of the system is attained by better
utilising the spare storage capacity available in the corpora-
tion’s desktops, which allows delaying investments on sys-
tem expansion to cope with increased demand. A hybrid
architecture that uses a centralised server to handle meta-
data operations and replica management, and a peer-to-peer
approach for serving data, associated to a clever data place-
ment mechanism that promotes the collocation of client and
server in the same machine, leads to the high efficiency
of the system. Finally, the use of desktop disks to store
data allows data storage capacity to grow naturally as new
machines (and associated users) are added to the system.
Moreover, the division of responsibilities between the cen-
tral metadata and replica management server and the dis-
tributed data storage servers allows the system to simul-
taneously serve a large amount of users with no perfor-
mance degradation, when compared to the state-of-practice
approaches based only on centralised servers.
BeeFS provides a general purpose fully POSIX-
compliant file system interface. This feature allows legacy
applications to transparently use the BeeFS service. BeeFS
checks POSIX compliance running more than 3,000 regres-
sion tests; these tests are available in a test suite [3] also
used for checking FreeBSD, Solaris, and Linux with UFS,
ZFS, ext3, and NTFS-3G file systems.
Currently, a stable version of BeeFS is in use in a produc-
tion environment in our laboratory, providing a file system
with more than 1Tbyte of capacity, serving around 40 peo-
ple. The service has been operational for over three months
with no major incidents. BeeFS is open-source software and
available at http://www.lsd.ufcg.edu.br/beefs.
References
[1] Fuse: Filesystem in userspace (http://fuse.sourceforge.net).
2008.
[2] Hadoop (http://hadoop.apache.org/core/). 2008.
[3] Posix test suite (http://www.tuxera.com/community/posix-
test-suite/). 2009.
[4] A. Adya, W. Bolosky, M. Castro, R. Chaiken, G. Cermak,
J. Douceur, J. Howell, J. Lorch, M. Theimer, and R. Watten-
hofer. Farsite: Federated, available, and reliable storage for
an incompletely trusted environment, 2002.
[5] P. A. Alsberg and J. D. Day. A principle for resilient shar-
ing of distributed resources. In ICSE ’76: Proceedings of
the 2nd international conference on Software engineering,
pages 562–570, Los Alamitos, CA, USA, 1976. IEEE Com-
puter Society Press.
[6] W. J. Bolosky, J. R. Douceur, D. Ely, and M. Theimer. Fea-
sibility of a serverless distributed file system deployed on
an existing set of desktop pcs. In SIGMETRICS ’00: Pro-
ceedings of the 2000 ACM SIGMETRICS international con-
ference on Measurement and modeling of computer systems,
pages 34–43, New York, NY, USA, 2000. ACM.
[7] J. R. Douceur and W. J. Bolosky. A large-scale study of
file-system contents. SIGMETRICS Perform. Eval. Rev.,
27(1):59–70, 1999.
[8] Z. Edward and R. Zayas. Afs-3 programmer’s reference:
Architectural overview, 1991.
[9] S. Ghemawat, H. Gobioff, and S.-T. Leung. The google file
system. In SOSP ’03: Proceedings of the nineteenth ACM
symposium on Operating systems principles, pages 29–43,
New York, NY, USA, 2003. ACM Press.
[10] J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols,
M. Satyanarayanan, R. N. Sidebotham, and M. J. West.
Scale and performance in a distributed file system. ACM
Trans. Comput. Syst., 6(1):51–81, February 1988.
[11] J. H. Howard, M. L. Kazar, S. G. Menees, D. A. Nichols,
M. Satyanarayanan, R. N. Sidebotham, and M. J. West.
Scale and performance in a distributed file system. ACM
Trans. Comput. Syst., 6(1):51–81, February 1988.
[12] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D. Geels,
R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer,
C. Wells, and B. Zhao. Oceanstore: An architecture for
global-scale persistent storage. In Proceedings of ACM AS-
PLOS. ACM, November 2000.
[13] A. W. Leung, S. Pasupathy, G. Goodson, and E. L. Miller.
Measurement and analysis of large-scale network file system
workloads. In In Proceedings of the 2008 USENIX Annual
Technical Conference, 2008.
[14] M. N. Nelson, B. B. Welch, and J. K. Ousterhout. Caching
in the sprite network file system. ACM Trans. Comput. Syst.,
6(1):134–154, February 1988.
[15] B. Pawlowski, C. Juszczak, P. Staubach, C. Smith, D. Lebel,
and D. Hitz. NFS version 3 design and implementation.
In Proceedings of the Summer USENIX Conference, pages
137–152, 1994.
[16] K. Petersen, M. Spreitzer, D. Terry, and M. Theimer. Bayou:
replicated database services for world-wide applications. In
EW 7: Proceedings of the 7th workshop on ACM SIGOPS
European workshop, pages 275–280, New York, NY, USA,
1996. ACM.
[17] D. Roselli, J. R. Lorch, and T. E. Anderson. A comparison
of file system workloads. In In Proceedings of the 2000
USENIX Annual Technical Conference, pages 41–54, 2000.
[18] M. Russinovich. Inside Win2K NTFS. Windows & .NET
Magazine, November–December, 2000.
[19] M. Satyanarayanan, J. Kistler, P. Kumar, M. Okasaki,
E. Siegel, and D. Steere. Coda: a highly available file sys-
tem for a distributed workstation environment. IEEE Trans-
actions on Computers, 39(4):447–459, 1990.
[20] A. S. Soares, T. E. Pereira, J. Silva, and F. V. Brasileiro.
Um modelo de armazenamento de metadatados tolerante a
falhas para o ddgfs (in portuguese). In WSCAD-SSC 2009:
Proceedings of the 10th Computational Systems Symposium,
October 2009.
[21] Swarup. Concurrency Control and Recovery in Database
Systems. 1987.
[22] S. Tweedie. EXT3, Journaling File System. In Ottawa Linux
Symposium, pages 24–29, 2000.
[23] S. Vazhkudai, X. Ma, V. Freeh, J. Strickland, N. Tammi-
needi, and S. Scott. Freeloader: Scavenging desktop storage
resources for scientific data. In SuperComputing’05, 2005.