Content uploaded by Arnab Paul
Author content
All content in this area was uploaded by Arnab Paul on Nov 02, 2017
Content may be subject to copyright.
Toward Scalable Monitoring on Large-Scale Storage for
Soware Defined Cyberinfrastructure
Arnab K. Paul
Virginia Tech
Ryan Chard
Argonne National Laboratory
Kyle Chard
University of Chicago
Steven Tuecke
University of Chicago
Ali R. Butt
Virginia Tech
Ian Foster
Argonne and University of Chicago
ABSTRACT
As research processes become yet more collaborative and increas-
ingly data-oriented, new techniques are needed to eciently man-
age and automate the crucial, yet tedious, aspects of the data life-
cycle. Researchers now spend considerable time replicating, cat-
aloging, sharing, analyzing, and purging large amounts of data,
distributed over vast storage networks. Software Dened Cyberin-
frastructure (SDCI) provides a solution to this problem by enhanc-
ing existing storage systems to enable the automated execution of
actions based on the specication of high-level data management
policies. Our SDCI implementation, called Ripple, relies on agents
being deployed on storage resources to detect and act on data events.
However, current monitoring technologies, such as inotify, are not
generally available on large or parallel le systems, such as Lustre.
We describe here an approach for scalable, lightweight, event detec-
tion on large (multi-petabyte) Lustre le systems. Together, Ripple
and the Lustre monitor enable new types of lifecycle automation
across both personal devices and leadership computing platforms.
ACM Reference format:
Arnab K. Paul, Ryan Chard, Kyle Chard, Steven Tuecke, Ali R. Butt, and Ian
Foster. 2017. Toward Scalable Monitoring on Large-Scale Storage for Soft-
ware Dened Cyberinfrastructure. In Proceedings of PDSW-DISCS’17: Second
Joint International Workshop on Parallel Data Storage & Data Intensive Scal-
able Computing Systems, Denver, CO, USA, November 12–17, 2017 (PDSW-
DISCS’17), 6 pages.
DOI: 10.1145/3149393.3149402
1 INTRODUCTION
The data-driven and distributed nature of modern research means
scientists must manage complex data lifecycles, across large-scale
and distributed storage networks. As data scales increase so too
does the overhead of data management—a collection of tasks and
processes that are often tedious and repetitive, such as replicating,
cataloging, sharing, and purging data. Software Dened Cyberin-
frastructure (SDCI) [
5
] can drastically lower the cost of performing
many of these tasks by transforming humble storage devices into
ACM acknowledges that this contribution was authored or co-authored by an em-
ployee, or contractor of the national government. As such, the Government retains a
nonexclusive, royalty-free right to publish or reproduce this article, or to allow others
to do so, for Government purposes only. Permission to make digital or hard copies for
personal or classroom use is granted. Copies must bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. To copy otherwise, distribute, republish, or post, requires prior
specic permission and/or a fee. Request permissions from permissions@acm.org.
PDSW-DISCS’17, Denver, CO, USA
©2017 ACM. 978-1-4503-5134-8/17/11. . . $15.00
DOI: 10.1145/3149393.3149402
“active” environments in which such tasks are automatically exe-
cuted in response to data events. SDCI enables high-level policies
to be dened and applied to storage systems, thereby facilitating
automation throughout the end-to-end data lifecycle. We have pre-
viously presented a prototype SDCI implementation, called Rip-
ple [
4
], capable of performing various actions in response to le
system events.
Ripple empowers scientists to express and automate mundane
data management tasks. Using a simple If-Trigger-Then-Action rule
notation, users program their storage devices to respond to specic
events and invoke custom actions. For example, one can express
that when les appear in a specic directory of their laboratory
machine they are automatically analyzed and the results replicated
to their personal device. Ripple supports inotify-enabled storage
devices (such as personal laptops); however inotify is not often
supported on large-scale or parallel le systems. To support large-
scale le systems we have developed a scalable monitoring solution
for the Lustre [
13
] le system. Our monitor exploits Lustre’s internal
metadata capabilities and uses a hierarchical approach to collect,
aggregate, and broadcast data events for even the largest storage
devices. Using this monitor Ripple agents can consume site-wide
events in real time, enabling SDCI over leadership class computing
platforms.
In this paper we present our scalable Lustre monitor. We analyze
the performance of our monitor using two Lustre le systems: an
Amazon Web Service deployment and a high performance deploy-
ment at Argonne National Laboratory’s (ANL) Leadership Comput-
ing Facility (ALCF). We show that our monitor is a scalable, reliable,
and light-weight solution for collecting and aggregating le system
events such that SDCI can be applied to multi-petabyte storage
devices.
The rest of this paper is organized as follows: Section 2 presents
related work. Section 3 discusses the SDCI concept and our imple-
mentation, Ripple. Section 4 describes our scalable monitor. We
evaluate our monitor in Section 5 before presenting concluding
remarks and future research directions in Section 6.
2 RELATED WORK
SDCI and data-driven policy engines are essential for reliably per-
forming data management tasks at scale. A common requirement for
these tools is the reliable detection of trigger events. Prior eorts in
this space have applied various techniques including implementing
data management abstraction layers and reliance on applications
to raise events. For example, the integrated Rule-Oriented Data
System [
11
] works by ingesting data into a closed data grid such
that it can manage the data and monitor events throughout the data
PDSW-DISCS’17, November 12–17, 2017, Denver, CO, USA A. K. Paul, R. Chard, K. Chard, S. Tuecke, A. R. Bu, and I. Foster
lifecycle. Other SDCI-like implementations rely on applications to
raise trigger events [1].
Monitoring distributed systems is crucial to their eective op-
eration. Tools such as MonALISA [
9
] and Nagios [
2
] have been
developed to provide insight into the health of resources and pro-
vide the necessary information to debug, optimize, and eectively
operate large computing platforms. Although such tools generally
expose le system status, utilization, and performance statistics,
they do not capture and report individual le events. Thus, these
tools cannot be used to enable ne-grained data-driven rule engines,
such as Ripple.
Other data-driven policy engines, such as IOBox [
12
], also re-
quire individual data events. IOBox is an extract, transform, and
load (ETL) system, designed to crawl and monitor local le systems
to detect le events, apply pattern matching, and invoke actions.
Like the initial implementation of Ripple, IOBox is restricted to us-
ing either inotify or a polling mechanism to detect trigger events. It
therefore cannot be applied at scale to large or parallel le systems,
such as Lustre.
Monitoring of large Lustre le systems requires explicitly de-
signed tools [
8
]. One policy engine that leverages a custom Lustre
monitor is the Robinhood Policy Engine [
7
]. Robinhood facilitates
the bulk execution of data management actions over HPC le sys-
tems. Administrators can congure, for example, policies to migrate
and purge stale data. Robinhood maintains a database of le system
events, using it to provide various routines and utilities for Lustre,
such as tools to eciently nd les and produce usage reports.
Robinhood employs a centralized approach to collecting and ag-
gregating data events from Lustre le systems, where metadata is
sequentially extracted from each metadata server by a single client.
Our approach employs a distributed method of collecting, process-
ing, and aggregating these data. In addition, our monitor publishes
events to any subscribed listener, allowing external services to
utilize the data.
3 BACKGROUND: RIPPLE
SDCI relies on programmable agents being deployed across storage
and compute devices. Together, these agents create a fabric of smart,
programmable resources. These agents can be employed to monitor
the underlying infrastructure, detecting and reporting data events
of interest, while also facilitating the remote execution of actions
on behalf of users. SDCI is underpinned by the same concepts as
Software Dened Networking (SDN). A separation of data and
control planes enables the denition of high-level, abstract rules
that can then be distributed to, and enforced by, the storage and
compute devices comprising the system.
Ripple [
4
] enables users to dene custom data management
policies which are then automatically enforced by participating
resources. Management policies are expressed as If-Trigger-Then-
Action style rules. Ripple’s implementation is based on a deployable
agent that captures events and a cloud service that manages the
reliable evaluation of rules and execution of actions. An overview
of Ripple’s architecture is depicted in Figure 1.
Architecture:
Ripple comprises a cloud-based service plus a
light-weight agent that is deployed on target storage systems. The
agent is responsible for detecting data events, ltering them against
active rules, and reporting events to the cloud service. The agent
also provides an execution component, capable of performing local
actions on a user’s behalf, for example running a container or
performing a data transfer with Globus [3].
A scalable cloud service processes events and orchestrates the
execution of actions. Ripple emphasizes reliability, employing mul-
tiple strategies to ensure events are not lost and that actions are
successfully completed. For example, agents repeatedly try to report
events to the service. Once an event is reported it is immediately
placed in a reliable Simple Queue Service (SQS) queue. Serverless
Amazon Lambda functions act on entries in this queue and remove
them once successfully processed. A cleanup function periodically
iterates through the queue and initiates additional processing for
events that were unsuccessfully processed.
Rules:
Ripple rules are distributed to agents to inform the event
ltering process and ensure relevant events are reported. A Ripple
rule consists of a trigger and an action. The trigger species the con-
ditions under which the action will be invoked. For example, a user
may set a rule to trigger when an image le is created in a specic
directory of their laptop. An action species the type of execution
to perform (such as initiating a transfer, sending an email, running
a docker container, or executing a local bash command, to name a
few), the agent on which to perform the action, and any necessary
parameters. These simple rules can be used to implement complex
pipelines whereby the output of one rule triggers a subsequent
action.
Event Detection:
Ripple uses the Python Watchdog module to
detect events on the local le systems. Using tools such as inotify
and kqueue, Watchdog enables Ripple to function over a wide
range of operating systems. As rules are registered with an agent
users also specify the path to be monitored. The agent employs
“Watchers” on each directory relevant to a rule. As events occur in
a monitored directory, the agent processes them against the active
rules to determine whether the event is relevant and warrants
reporting to the cloud service.
Limitations:
A key limitation of Ripple is its inability to be
applied, at scale, to large storage devices (i.e., those that are not
supported by Watchdog). Further, our approach primarily relies
on targeted monitoring techniques, such as inotify, where specic
directories are monitored. Thus, Ripple cannot enforce rules which
are applied to many directories, such as site-wide purging policies.
Relying on targeted monitors presents a number of limitations.
For example, inotify has a large setup cost due to its need to crawl
the le system to place watchers on each monitored directory. This
is both time consuming and resource intensive, often consuming a
signicant amount of unswappable kernel memory. Each watcher
requires 1Kb of memory on a 64-bit machine, meaning over 512MB
of memory is required to concurrently monitor the default maxi-
mum (524,288) directories.
We have explored an alternative approach using a polling tech-
nique to detect le system changes. However, crawling and record-
ing le system data is prohibitively expensive over large storage
systems.
Scalable Monitoring on Large-Scale Storage for SDCI PDSW-DISCS’17, November 12–17, 2017, Denver, CO, USA
Figure 1: Ripple architecture. A local agent captures and lters le events before reporting them to the cloud service for
processing. Actions are routed to agents for execution.
4 SCALABLE MONITORING
Ripple requires scalable monitoring techniques in order to be ap-
plied to leadership class storage systems. To address this need we
have developed a light-weight, scalable monitor to detect and report
data events for Lustre le systems. The monitor leverages Lustre’s
internal metadata catalog to detect events in a distributed manner
and aggregates them for evaluation. The monitor produces a com-
plete stream of all le system events to any subscribed device, such
as a Ripple agent. The monitor also maintains a rotating catalog
of events and an API to retrieve recent events in order to provide
fault tolerance.
Like other parallel le systems, Lustre does not support ino-
tify; however, it does maintain an internal metadata catalog, called
“ChangeLog.” An example ChangeLog is depicted in Table 1. Every
entry in a ChangeLog consists of the record number, type of le
event, timestamp, date, ags, target File Identier (FID), parent FID,
and the target name. Lustre’s ChangeLog is distributed across a set
of Metadata Servers (MDS). Actions which cause changes in the
le system namespace or metadata are recorded in a single MDS
ChangeLog. Thus, to capture all changes made on a le system our
monitor must be applied to all MDS servers.
Our Lustre monitor, depicted in Figure 2, employs a hierarchi-
cal publisher-subscriber model to collect events from each MDS
ChangeLog and report them for aggregation. This model has been
proven to enable scalable data collection solutions, such as those
that monitor performance statistics from distributed Lustre storage
servers [
10
]. One Collector service is deployed for each MDS. The
Collector is responsible for interacting with the local ChangeLog to
extract new events before processing and reporting them. Events
are reported to a single Aggregator for prosperity and publication
to consumers.
File events, such as creation, deletion, renaming, and attribute
changes, are recorded in the ChangeLog as a tuple containing a
timestamp, event type, parent directory identier, and le name.
Our monitor collects, aggregates, and publishes these events using
three key steps:
(1) Detection:
Events are initially extracted from the ChangeLog
by a Collector. The monitor will deploy multiple Collec-
tors such that each MDS can be monitored concurrently.
Each new event detected by a Collector is required to be
processed prior to being reported.
(2) Processing:
Lustre’s ChangeLog uses parent and target
le identiers (FIDs) to uniquely represent les and di-
rectories. These FIDs are not useful to external services,
such as Ripple agents, and must be resolved to absolute
path names. Therefore, once a new event is retrieved by
a Collector it uses the Lustre d2path tool to resolve FIDs
and establish absolute path names. The raw event tuples
are then refactored to include the user-friendly paths in
place of the FIDs before being reported.
(3) Aggregation:
A publisher-subscriber message queue (Ze-
roMQ [
6
]) is used to pass messages between the Collectors
and the Aggregator. Once an event is reported to the Aggre-
gator it is immediately placed in a queue to be processed.
The Aggregator is multi-threaded, enabling it to both pub-
lish events to subscribed consumers and store the events
in a local database with minimal overhead. The Aggrega-
tor maintains this database and exposes an API to enable
consumers to retrieve historic events.
Collector’s are also responsible for purging their respective
ChangeLogs. Each Collector maintains a pointer to the most re-
cently extracted event and can therefore clear the ChangeLog of
previously processed events. This ensures that events are not missed
and also means the ChangeLog will not become overburdened with
stale events.
5 EVALUATION
We have deployed our monitor over two Lustre testbeds to analyze
the performance, overheads, and bottlenecks of our solution. Before
investigating the monitor’s performance we rst characterize the
capabilities of the testbeds to determine the rate at which events
are generated. Using a specically built event generation script, we
PDSW-DISCS’17, November 12–17, 2017, Denver, CO, USA A. K. Paul, R. Chard, K. Chard, S. Tuecke, A. R. Bu, and I. Foster
Table 1: A Sample ChangeLog Record.
Event ID Type Timestamp Datestamp Flags Target FID Parent FID Target Name
13106 01CREAT 20:15:37.1138 2017.09.06 0x0 t=[0x200000402:0xa046:0x0] p=[0x200000007:0x1:0x0] data1.txt
13107 02MKDIR 20:15:37.5097 2017.09.06 0x0 t=[0x200000420:0x3:0x0] p=[0x61b4:0xca2c7dde:0x0] DataDir
13108 06UNLNK 20:15:37.8869 2017.09.06 0x1 t=[0x200000402:0xa048:0x0] p=[0x200000007:0x1:0x0] data1.txt
Figure 2: The scalable Lustre monitor used to collect, aggre-
gate, and publish events to Ripple agents.
apply the monitor under high load to determine maximum through-
put and identify bottlenecks. Finally, we use le system dumps from
a production 7PB storage system to evaluate whether the monitor
is capable of supporting very large-scale storage systems.
5.1 Testbeds
We employ two testbeds to evaluate the monitor’s performance. The
rst testbed, referred to as AWS, is a cloud deployment of Lustre
using ve Amazon Web Service EC2 instances. The deployment
uses Lustre Intel Cloud Edition, version 1.4, to construct a 20GB
Lustre le system over ve, low-performance, t2.micro instances
and an unoptimized EBS volume. The conguration includes two
compute nodes, a single Object Storage Service (OSS), an MGS, and
one MDS.
The second testbed provides a larger, production-quality, storage
system. This testbed, referred to as Iota, uses Argonne National Lab-
oratory’s Iota cluster’s le system. Iota is one of two pre-exascale
systems at Argonne and comprises 44 compute nodes, each with 72
cores and 128GB of memory. Iota’s 897TB Lustre store leverages the
same high performance hardware and conguration (including four
MDS) as the 150PB store planned for deployment with the Aurora
supercomputer. However, it is important to note that at present, the
le system is not yet congured to load balance metadata across
all four MDS, thus these tests were performed with just one MDS.
As a baseline analysis we rst compare operation throughput on
each le system. We use a Python script to record the time taken
to create, modify, or delete 10,000 les on each le system. The
performance of these two parallel le systems diers substantially,
as is shown in Table 2. Due to the low-performance nature of the
instances comprising the AWS testbed (t2.micro), just 352 les could
be written per second. A total of 1366 events can be generated per
second. As expected, the performance of the Iota testbed signif-
icantly exceeded this rate. It is able to create over 1300 les per
second and more than 9500 total events per second.
Table 2: Testbed Performance Characteristics.
AWS Iota
Storage Size 20GB 897TB
Files Created (events/s) 352 1389
Files Modied (events/s) 534 2538
Files Deleted (events/s) 832 3442
Total Events (events/s) 1366 9593
5.2 Results
To investigate the performance of our monitor we use the Python
script to generate le system events while our monitor extracts
them from an MDS ChangeLog, processes them, and reports them
to a listening Ripple agent. To minimize the overhead caused by
passing messages over the network, we have conducted these tests
on a single node. The node is also instrumented to collect mem-
ory and CPU counters during the tests to determine the resource
utilization of the collection and aggregation processes.
Event Throughput:
Our event generation script combines le
creation, modication, and deletion to generate multiple events for
each le. Using this technique we are able to generate over 1300
events per second on AWS and more than 9500 events per second
on Iota.
When generating 1366 events per second the AWS-based monitor
is capable of detecting, processing, and reporting just 1053 to the
consuming Ripple agent. Analysis of the monitor’s pipeline shows
that the throughput is primarily limited by the preprocessing step
following events being extracted from a ChangeLog. This is due
to in part to the low-performance, t2.micro instance types used in
the testbed. When experimenting on the Iota testbed we found the
monitor is able to process and report, on average, 8162 events per
second. This is 14.91% lower than the maximum event generation
rate achieved on the testbed. Although this is an improvement
over the AWS testbed, we found the overhead to be caused by the
repetitive use of the d2path tool when resolving an event’s absolute
path. To alleviate this problem we plan to process events in batches,
rather than independently, and temporarily cache path mappings
Scalable Monitoring on Large-Scale Storage for SDCI PDSW-DISCS’17, November 12–17, 2017, Denver, CO, USA
to minimize the number of invocations. Another limitation with
this experimental conguration is the use of a single MDS. If the
d2path resolutions were distributed across multiple MDS, the
throughput of the monitor would surpass the event generation rate.
It is important to note that there is no loss of events once they
have been processed, meaning the aggregation and reporting steps
introduce no additional overhead.
Monitor Overhead:
We have captured the CPU and memory
utilization of the Collector, Aggregator, and a Ripple agent con-
sumer processes. Table 3 shows the peak resource utilization during
the Iota throughput experiments. These results show the CPU cost
of operating the monitor is small. The memory footprint is due to
the use of a local store that records a list of every event captured by
the monitor. In a production setting we could further limit the size
of this local store, which would in turn reduce the overall resource
usage. We conclude that when using an appropriate maximum store
size, deploying these components on the physical MDS and MGS
servers would induce negligible overhead on their performance.
Table 3: Maximum Monitor Resource Utilization.
CPU (%) Memory (MB)
Collector 6.667 281.6
Aggregator 0.059 217.6
Consumer 0.02 12.8
5.3 Scaling Performance
Understanding the throughput of the monitor only provides value
when put in the context of real-world requirements. Thus, we ana-
lyzed NERSC’s production 7.1PB GPFS le system, called tlproject2.
This system has 16,506 users and over 850 million les. We analyzed
le system dumps from a 36 day period and compared consecutive
days to establish the number of les that are created or changed
each day. It is important to note that this method does not represent
an accurate value for the number of times a le is modied, as only
the most recent le modication is detectable, and also does not
account for short lived les.
As shown in Figure 3, we found a peak of over 3.6 million dier-
ences between two consecutive days. When distributed over a 24
hour period this equates to just 42 events per second. Assuming a
worst-case scenario where all of these events occur within an eight
hour period results in approximately 127 events per second, still
well within the monitor’s performance range. Although only hy-
pothetical, if we assume events scale linearly with storage size, we
can extrapolate and expect Aurora’s 150PB to generate 25 times as
many events, or 3,178 events per second, which is also well within
the capabilities of the monitor. It should be noted that this estimate
could signicantly underestimate the peak generation of le events.
Further online monitoring of such devices is necessary to account
for short lived les, le modications, and the sporadic nature of
data generation.
6 CONCLUSION
SDCI can resolve many of the challenges associated with routine
data management processes enabling researchers to automate many
0
1,000,000
2,000,000
3,000,000
0 10 20 30
Day
Events
Event Type
Created
Modified
Figure 3: The number of les created and modied on
NERSC’s 7.1PB GPFS le system, tlproject2, over a 35 day
period.
of the tedious tasks they must perform. In prior work we presented
a system for enabling such automation, however it was designed
using libraries commonly available on personal computers but
not often available on large-scale storage systems. Our scalable
Lustre monitor addresses this shortcoming and enables Ripple
to be used on some of the world’s largest storage systems. Our
results show that the Lustre monitor is able to detect, process, and
report thousands of events per second—a rate sucient to meet the
predicted needs of the forthcoming 150PB Aurora le system.
Our future research focuses on investigating monitor perfor-
mance when using multiple distributed MDS, exploring and evalu-
ating dierent message passing techniques between the collection
and aggregation points, and comparing performance against Robin-
hood in production settings. We will also further investigate the
behavior of large le systems to more accurately characterize the
requirements of our monitor. Finally we are actively working to
deploy Ripple on production systems and in real scientic data
management scenarios, in so doing we are demonstrating the value
of SDCI concepts in scientic computing platforms.
ACKNOWLEDGMENTS
This research used resources of the Argonne Leadership Computing
Facility, which is a DOE Oce of Science User Facility supported
under Contract DE-AC02-06CH11357. We also acknowledge gener-
ous research credits provided by Amazon Web Services. This work
is also sponsored in part by the NSF under the grants: CNS-1565314,
CNS-1405697, and CNS-1615411.
REFERENCES
[1]
M. AbdelBaky, J. Diaz-Montes, and M. Parashar. Software-dened environments
for science and engineering. The International Journal of High Performance
Computing Applications, page 1094342017710706, 2017.
[2] W. Barth. Nagios: System and network monitoring. No Starch Press, 2008.
PDSW-DISCS’17, November 12–17, 2017, Denver, CO, USA A. K. Paul, R. Chard, K. Chard, S. Tuecke, A. R. Bu, and I. Foster
[3]
K. Chard, S. Tuecke, and I. Foster. Ecient and secure transfer, synchronization,
and sharing of big data. IEEE Cloud Computing, 1(3):46–55, 2014.
[4]
R. Chard, K. Chard, J. Alt, D. Y. Parkinson, S. Tuecke, and I. Foster. RIPPLE: Home
Automation for Research Data Management. In The 37th IEEE International
Conference on Distributed Computing Systems (ICDCS), 2017.
[5]
I. Foster, B. Blaiszik, K. Chard, and R. Chard. Software Dened Cyberinfrastruc-
ture. In The 37th IEEE International Conference on Distributed Computing Systems
(ICDCS), 2017.
[6]
P. Hintjens. ZeroMQ: messaging for many applications. " O’Reilly Media, Inc.",
2013.
[7]
T. Leibovici. Taking back control of HPC le systems with Robinhood Policy
Engine. arXiv preprint arXiv:1505.01448, 2015.
[8] R. Miller, J. Hill, D. A. Dillow, R. Gunasekaran, G. M. Shipman, and D. Maxwell.
Monitoring tools for large scale systems. In Proceedings of Cray User Group
Conference (CUG 2010), 2010.
[9]
H. B. Newman, I. C. Legrand, P. Galvez, R. Voicu, and C. Cirstoiu. Monalisa: A
distributed monitoring service architecture. arXiv preprint cs/0306096, 2003.
[10]
A. K. Paul, A. Goyal, F. Wang,S. Oral, A. R. Butt, M. J. Brim, and S. B. Srinivasa. I/o
load balancing for big data hpc applications. In 5th IEEE International Conference
on Big Data(Big Data), 2017.
[11]
A. Rajasekar, R. Moore, C.-y. Hou, C. A. Lee, R. Marciano, A. de Torcy, M. Wan,
W. Schroeder, S.-Y. Chen, L. Gilbert, P. Tooby, and B. Zhu. iRODS Primer: In-
tegrated rule-oriented data system. Synthesis Lectures on Information Concepts,
Retrieval, and Services, 2(1):1–143, 2010.
[12]
R. Schuler, C. Kesselman, and K. Czajkowski. Data centric discovery with a
data-oriented architecture. In 1st Workshop on The Science of Cyberinfrastructure:
Research, Experience, Applications and Models, SCREAM ’15, pages 37–44, New
York, NY, USA, 2015. ACM.
[13]
P. Schwan et al. Lustre: Building a le system for 1000-node clusters. In Proceed-
ings of the 2003 Linux symposium, volume 2003, pages 380–386, 2003.