Conference Paper

C-Grid: Enabling iRODS-based Grid Technology for Community Health Research

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

A Community Grid web portal, C-Grid, has been developed in this study for storing, managing and sharing large amounts of distributed community health related data in a data grid, thus facilitating further analysis of these datasets by health researchers in a collaborative environment. Remote management of this data grid is performed using the middleware iRODS, the Integrated Rule-Oriented Data System. A PHP-based wrapper, ez-iRODS, has been created as a component of C-Grid to interact with this middleware through PRODS, a client application programming interface (API). C-Grid serves as a gateway for the XSEDE resources, and also helps the users via ez-iRODS to create and manage ‘virtual data collection’ that can be stored in heterogeneous data resources across the distributed network. This web-based system has been developed with an objective of long-term data preservation, unified data access and sharing domain specific data amongst the scientific research collaborators of myCHOIS project.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Grid is a serviceoriented architecture that is dedicated to hold up the large organization in reliable way [3]. In contrast to prevailing technologies that intensify verbal exchanges amongst resources, grid computing makes use of abandoned cycles of all desktops with in a network to solve complicated problems that would be difficult to be solved by any stand-alone equipment [4][5] [6]. Scheduling in diversified network of resources is NP-hard problem. ...
Conference Paper
Full-text available
Grid computing federates heterogeneous resources distributed over different geographical domains with the perspective of providing high computational power in the most seamless way. Grid environment is a decisive and cost effective intelligent paradigm that facilitates fixing of complex parallel application by co-coordinating resources (storage space, software applications, computers, sensors) and sharing data. Heterogeneity of resources in grid paradigm has faced major scheduling issues. The paper suggests hybridized algorithm for scheduling parallel jobs. The algorithm put together features of Particle swarm optimization, cuckoo search and genetic algorithm to resolve various scheduling issues.
... Grid computing has been successfully implemented to solve various real-life problems. For instance, finding Protein binding sites using DNA@Home volunteer computing project [7], grid computing for disaster mitigation implemented by Universiti Sains Malaysia [8], C-Grid based on Integrated Rule-Oriented Data System for health care community [9], and ANSYS® Commercial Suite on the EGI Grid Platform [10]. ...
Conference Paper
Full-text available
Exploitation and exploration mechanisms are the main components in metaheuristics algorithms. These mechanisms are implemented explicitly in ant colony system algorithm. The rate between the exploitation and exploration mechanisms is controlled using a parameter set by the users of the algorithm. However, the rate remains unchanged during the algorithm iterations, which makes the algorithm either bias toward exploitation or exploration. Hence, this study proposes a strategic oscillation rate to control the exploitation and exploration in ant colony system. The proposed algorithm was evaluated with job scheduling problem benchmarks on grid computing. Experimental results show that the proposed algorithm outperforms other metaheuristics algorithms in terms of makespan and flowtime. The strategic oscillation has improved the exploration and exploitation in ant colony system.
Article
HydroShare (https://www.hydroshare.org) is an online collaborative system to support the open sharing of hydrologic data, analytical tools, and computer models. Hydrologic data and models are often large, extending to multi-gigabyte or terabyte scale, and as a result, the scalability of centralized data management poses challenges for a system such as HydroShare. A distributed data management framework that enables distributed physical data storage and management in multiple locations thus becomes a necessity. We use the iRODS (Integrated Rule-Oriented Data System) data grid middleware as the distributed data storage and management back end in HydroShare. iRODS provides a unified virtual file system for distributed physical storages in multiple locations and enables data federation across geographically dispersed institutions around the world. In this paper, we describe the iRODS-based distributed data management approaches implemented in HydroShare to provide a practical demonstration of a production system for supporting big data in the environmental sciences.
Conference Paper
In this paper, we present our developed health-IT solution that address these challenges: developing the strategies not only to store such a vast amount of data but also making those available to the researchers for further analysis that can measure the outcome on the participant's health. The developed community health grid (C-Grid) solution to store, manage and share large amounts of these instruction materials and participant's health related data, where the remote management and analysis of this data grid is performed using iRODS, the Integrated Rule-Oriented Data System is discussed and presented in this paper.
Conference Paper
As communication was the major perspective of traditional networks, grid computing focuses on figuring the problems by using unprocessed CPU cycles that cannot be resolved by stand-alone computers. Grid being an earthly distributed network of computers provides a clear, coordinated, consistent and reliable computing medium to various applications. Owing to the heterogeneity of resources in a grid environment, job scheduling is problematic and therefore, needs competent schedulers. The paper provides a comparative analysis of various Ant colony optimization variants based on their effectiveness in determining optimal or near optimal solutions. The conclusion drawn from the survey explores ACO's effectiveness in solving the various scheduling problems. However, uncertainty in convergence time and early Designation of the initial and the extreme point hinders optimal scheduling. With the support of literature survey, not only assured guidelines are extracted for ACO algorithm but also, promising directions are provided for future work.
Chapter
Grid provides a clear, coordinated, consistent and reliable computing medium to solve complex sequential and parallel applications through the use of idle CPU cycles. Scheduling optimizes the objective function(s) by mapping the parallel jobs to the available resources. Owing to the heterogeneity of resources in grid, scheduling associates with class of NP-hard problems due to which reaching optimal solution surpasses the time constraint. Metaheuristic algorithms take polynomial time to reach the near-optimal solutions for NP-hard problems. Major research issues in metaheuristic algorithms are solution quality and convergence speed that have been revised by using consolidation approach. This paper proposes a hybrid PSACGA algorithm that consolidate the features of Particle Swarm Optimization (PSO), Ant Colony optimization (ACO) and operators of Genetic algorithm to solve parallel job scheduling problem. Experimental results of the proposed technique are compared with existing deterministic and metaheuristic job scheduling algorithms. Experimental results have indicated that the proposed hybrid PSAGA algorithm provides better performance than existing contemporary algorithms.
Conference Paper
Grid computing proposes a dynamic and earthly distributed organization of resources that harvest ideal CPU cycle to drift advance computing demands and accommodate user's prerequisites. Heterogeneous gridsdemand efficient allocation and scheduling strategies to cope up with the expanding grid automations. In order to obtain optimal scheduling solutions, primary focus of research has shifted towards metaheuristic techniques. The paper uses different parameters to provide analytical study of variants of Ant Colony Optimization for scheduling sequential jobs in grid systems. Based on the literature analysis, one can summarize that ACO is the most convincing technique for schedulingproblems. However, incapacitation of ACO to fix up a systematized startup and poor scattering capability cast down its efficiency. To overpower these constraints researchers have proposed different hybridizations of ACO that manages to sustain more effective results than standalone ACO.
Article
Full-text available
CHOIS, the Childhood Obesity Informatics System, supported by high-performance grid computing has been developed following Open Grid Services Architecture, an accepted standard for accessing Grid Computing and other services under Open Grid Collaborating Environments (OGCE). For this work in progress 1 , we are developing various web based tools for data input & data visualization, data integration, SMS alert system, Content Management system (CMS) and a Decision Support System (DSS), to name a few. A mobile application has also been developed for interacting with its remote database using a smart phone at the point-of-care. This system is now used by the Illinois Department of Human Services (IDHS) for obesity surveillance among the school-going children.
Article
Full-text available
This paper describes practical applications using resources on different kinds of Grid middleware. At the KEK Computing Research Center, many jobs must be submitted for physics simulations involving large number of data files from physics experiments. The available resources of the various Grids should be used in a cooperative way, but specialized knowledge is currently required to use each Grid. Our solution is using SAGA (Simple API for Grid Applications), which provides a unified interface that conceals the differences among the different kinds of Grid middleware. We developed SAGA adaptors for job execution, file management, and catalog services. The job adaptors we created are applied to each kind of Grid middleware: NAREGI, PBSPro, and Torque. The file adaptors support the Data Grids: iRODS and Gfarm. The replica adaptor currently in use is for the catalog service: RNS (Resource Namespace Service) and iRODS. SAGA with the adaptors allows us to utilize the various Grid resources along with local resources without any concerns about the underlying middleware. Technical details and sample applications are described in this paper.
Conference Paper
Full-text available
The integrated Rule Orientated Data System (iRODS) is a Grid data management system that organizes distributed data and their metadata. A Rule Engine allows a flexible definition of data storage, data access and data processing. This paper presents scenarios implemented in a benchmark tool to measure the performance of an iRODS environment as well as results of measurements with large datasets. The scenarios concentrate on data transfers, metadata transfers and stress tests. The user has the possibility to influence the scenarios to adapt them to his own use case. The results show the possibility to find bottlenecks and potential to optimize the settings of an iRODS environment.
Conference Paper
Full-text available
UNICORE (UNIFORM INTERFACE TO COMPUTER RESOURCES) PROVIDES A SEAMLESS AND SECURE ACCESS TO DISTRIBUTED SUPERCOMPUTER RE-SOURCES. THIS PAPER WILL GIVE AN OVERVIEW OF THE ITS ARCHITECTURE, SECU-RITY FEATURES, USER FUNCTIONS, AND MECHANISMS FOR THE INTEGRATION OF EX-ISTING APPLICATIONS INTO UNICORE. CAR-PARRINELLO MOLECULAR DYNAMICS (CPMD) APPLICATION IS USED AS AN EXAMPLE TO DEMONSTRATE THE CAPABIL-ITIES OF UNICORE.
Article
Full-text available
New mobile computing devices including smartphones and tablet computers have emerged to facilitate data collection in realtime at the point-of-care. Earlier, we developed a web based Childhood Obesity Informatics System (CHOIS) and deployed it for obesity surveillance by the Illinois Department of Public Health (IDPH). In the process, a school nurse collects data on an individual's height and weight for determining Body-Mass-Index (BMI), which is conventionally used for determining at-risk and obese patients. However, this process is often limited by the internet access at the site. This paper describes a solution by demonstrating a smartphone-based mobile application, mCHOIS. The application developed in this project enables a field worker to input or modify the data and store it locally in the phone. Once internet connection is available either through the broadband or through the built-in wifi, data can be sent to the remote database of CHOIS. Updating the data and visualization of the report are also available through the phone's browser. This application has been successfully field tested and is now under deployment for use by the Illinois Department of Human Services (IDHS) for its School Health Program.
Article
Full-text available
Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data. We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data.The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced. iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata.
Article
Full-text available
CHOIS, the Child Health and Obesity Informatics System, is developed using open source portal technology with three-tiered Open Grid Services Architecture, an accepted standard for accessing Grid Computing and other services under Open Grid Collaborating Environments (OGCE). Its web application provides web based forms with 112 different fields to enter data ranging from demographic, height & weight for BMI, to genomic information. Automatic computation of BMI, BMI percentile and the risk of obesity alert are embedded into this system. After successful testing of the prototype, CHOIS is now ready to be used by the Illinois Department of Human Services (DHS) for obesity surveillance. This HIPAA & FERPA compliant secure system, integrating large databases in a high performance grid computing environment, enables school-nurse to collect data on school children and report statistical and surveillance information on BMI to identify those at-risk and obese for obesity prevention and intervention programs.
Article
Full-text available
We review the work carried out within the eMinerals project to develop eScience solutions that facilitate a new generation of molecular-scale simulation work. Technological developments include integration of compute and data systems, developing of collaborative frameworks and new researcher-friendly tools for grid job submission, XML data representation, information delivery, metadata harvesting and metadata management. A number of diverse science applications will illustrate how these tools are being used for large parameter-sweep studies, an emerging type of study for which the integration of computing, data and collaboration is essential.
Article
Full-text available
To broadly examine the potential health and financial benefits of health information technology (HIT), this paper compares health care with the use of IT in other industries. It estimates potential savings and costs of widespread adoption of electronic medical record (EMR) systems, models important health and safety benefits, and concludes that effective EMR implementation and networking could eventually save more than $81 billion annually--by improving health care efficiency and safety--and that HIT-enabled prevention and management of chronic disease could eventually double those savings while increasing health and other social benefits. However, this is unlikely to be realized without related changes to the health care system.
Conference Paper
Full-text available
In wide area computing systems, it is often desirable to create remote read-only copies (replicas) of files. Replication can be used to reduce access latency, improve data locality, and/or increase robustness, scalability and performance for distributed applications. We define a replica location service (RLS) as a system that maintains and provides access to information about the physical locations of copies. An RLS typically functions as one component of a data grid architecture. This paper makes the following contributions. First, we characterize RLS requirements. Next, we describe a parameterized architectural framework, which we name Giggle (for GIGa-scale Global Location Engine), within which a wide range of RLSs can be defined. We define several concrete instantiations of this framework with different performance characteristics. Finally, we present initial performance results for an RLS prototype, demonstrating that RLS systems can be constructed that meet performance goals.
Article
Full-text available
The integration of grid, data grid, digital library, and preservation technology has resulted in software infrastructure that is uniquely suited to the generation and management of data. Grids provide support for the organization, management, and application of processes. Data grids manage the resulting digital entities. Digital libraries provide support for the management of information associated with the digital entities. Persistent archives provide long-term preservation. We examine the synergies between these data management systems and the future evolution that is required for the generation and management of information.
Article
Full-text available
This paper describes the architecture of the SDSC Storage Resource Broker (SRB). The SRB is middleware that provides applications a uniform API to access heterogeneous distributed storage resources including, filesystems, database systems, and archival storage systems. The SRB utilizes a metadata catalog service, MCAT, to provide a "collection"- oriented view of data. Thus, data items that belong to a single collection may, in fact, be stored on heterogeneous storage systems. The SRB infrastructure is being used to support digital library projects at SDSC. This paper describes the architecture and various features of the SDSC SRB. 1 Introduction The San Diego Supercomputer Center (SDSC) is involved in developing infrastructure for a high performance distributed computing environment as part of its National Partnership for Advanced Computational Infrastructure (NPACI) project funded by the NSF. The NSF program in Partnerships for Advanced Computational Infrastructure (PACI), which fund...
Article
The World Wide Web has succeeded in large part because its software architecture has been designed to meet the needs of an Internet-scale distributed hypermedia application. The modern Web architecture emphasizes scalability of component interactions, generality of interfaces, independent deployment of components, and intermediary components to reduce interaction latency, enforce security, and encapsulate legacy systems. In this article we introduce the Representational State Transfer (REST) architectural style, developed as an abstract model of the Web architecture and used to guide our redesign and definition of the Hypertext Transfer Protocol and Uniform Resource Identifiers. We describe the software engineering principles guiding REST and the interaction constraints chosen to retain those principles, contrasting them to the constraints of other architectural styles. We then compare the abstract model to the currently deployed Web architecture in order to elicit mismatches between the existing protocols and the applications they are intended to support.
Article
Large scientific projects need collaborative data sharing environments. For projects like the Ocean Observations Initiative (OOI), the Temporal Dynamics of Learning Center (TDLC) and Large-scale Synoptic Survey Telescope (LSST) the amount of data collected will be on the order of Petabytes, stored across distributed heterogeneous resources under multiple administrative organizations. Policy-oriented data management is essential in such collaborations. The integrated Rule-Oriented Data System (iRODS) is a peer-to-peer, federated server-client architecture that uses a distributed rule engine for data management to apply policies encoded as rules. The rules are triggered on data management events (ingestion, access, modifications, annotations, format conversion, etc) as well as periodically (to check integrity of the data collections, intelligent data archiving and placement, load balancing, etc). Rules are applied by system administrators (e.g. for resource creation, user management, etc.) and by individual users, groups and data providers to tailor the sharing and access of data for their own needs. In this paper, we will discuss the architecture of the iRODS middleware system and discuss some of the applications of the software.
Chapter
This paper previews the imminent flood of scientific data expected from the next generation of experiments, simulations, sensors and satellites. In order to be exploited by search engines and data mining software tools, such experimental data needs to be annotated with relevant metadata giving information as to provenance, content, conditions and so on. The need to automate the process of going from raw data to information to knowledge is briefly discussed. The paper argues the case for creating new types of digital libraries for scientific data with the same sort of management services as conventional digital libraries in addition to other data-specific services. Some likely implications of both the Open Archives Initiative and e-Science data for the future role for university libraries are briefly mentioned. A substantial subset of this e-Science data needs to archived and curated for long-term preservation. Some of the issues involved in the digital preservation of both scientific data and of the programs needed to interpret the data are reviewed. Finally, the implications of this wealth of e-Science data for the Grid middleware infrastructure are highlighted.
Chapter
IntroductionThe Need for Grid TechnologiesBackground An Open Grid Services ArchitectureApplication ExampleTechnical DetailsNetwork Protocol BindingsHigher-level ServicesRelated WorkSummaryAcknowledgmentsReferences
Conference Paper
In this paper we report on preliminary work and architectural design carried out in the “Data Management” work package in the International Data Grid project. Our aim within a time scale of three years is to provide Grid middleware services supporting the I/O-intensive world-wide distributed next generation experiments in High-Energy Physics, Earth Observation and Bioinformatics. The goal is to specify, develop, integrate and test tools and middleware infrastructure to coherently manage and share Petabyte-range information volumes in high-throughput production-quality Grid environments. The middleware will allow secure access to massive amounts of data in a universal name-space, to move and replicate data at high speed from one geographical site to another, and to manage synchronisation of remote copies. We put much attention on clearly specifying and categorising existing work on the Grid, especially in data management in Grid related projects. Challenging use cases are described and how they map to architectural decisions concerning data access, replication, meta data management, security and query optimisation.
Article
Background: Because physical activity researchers are increasingly using objective portable devices, this review describes the current state of the technology to assess physical activity, with a focus on specific sensors and sensor properties currently used in monitors and their strengths and weaknesses. Additional sensors and sensor properties desirable for activity measurement and best practices for users and developers also are discussed. Best practices: We grouped current sensors into three broad categories for objectively measuring physical activity: associated body movement, physiology, and context. Desirable sensor properties for measuring physical activity and the importance of these properties in relationship to specific applications are addressed, and the specific roles of transducers and data acquisition systems within the monitoring devices are defined. Technical advancements in sensors, microcomputer processors, memory storage, batteries, wireless communication, and digital filters have made monitors more usable for subjects (smaller, more stable, and longer running time) and for researchers (less costly, higher time resolution and memory storage, shorter download time, and user-defined data features). Future directions: Users and developers of physical activity monitors should learn about the basic properties of their sensors, such as range, accuracy, and precision, while considering the data acquisition/filtering steps that may be critical to data quality and may influence the desirable measurement outcome(s).
Book
Policy-based data management enables the creation of community-specific collections. Every collection is created for a purpose. The purpose defines the set of properties that will be associated with the collection. The properties are enforced by management policies that control the execution of procedures that are applied whenever data are ingested or accessed. The procedures generate state information that defines the outcome of enforcing the management policy. The state information can be queried to validate assessment criteria and verify that the required collection properties have been conserved. The integrated Rule-Oriented Data System implements the data management framework required to support policy-based data management. Policies are turned into computer actionable Rules. Procedures are composed from a Micro-service-oriented architecture. The result is a highly extensible and tunable system that can enforce management policies, automate administrative tasks, and periodically validate assessment criteria. Table of Contents: Introduction / Integrated Rule-Oriented Data System / iRODS Architecture / Rule-Oriented Programming / The iRODS Rule System / iRODS Micro-services / Example Rules / Extending iRODS / Appendix A: iRODS Shell Commands / Appendix B: Rulegen Grammar / Appendix C: Exercises / Author Biographies
Article
Despite a number of challenges, patients' medical records are slowly making the transition to the digital age.
Article
Research is generating large quantities of digital material, much of it irreplaceable, and there is a pressing need to maintain long-term access to it. Not only is the quantity of data growing in size, it is becoming much more diverse and complex, significantly complicating the issues around its curation. Automation of curation is key if a scalable solution is to be found. We describe an approach to automation in which digital curation policies and strategies are represented as rules, which are implemented in data grids based on the iRODS middleware.
Article
The obesity epidemic has grown rapidly into a major public health challenge, in the United States and worldwide. The scope and scale of the obesity epidemic motivate an urgent need for well-crafted policy interventions to prevent further spread and (potentially) to reverse the epidemic. Yet several attributes of the epidemic make it an especially challenging problem both to study and to combat. This article shows that these attributes--the great breadth in levels of scale involved, the substantial diversity of relevant actors, and the multiplicity of mechanisms implicated--are characteristic of a complex adaptive system. It argues that the obesity epidemic is driven by such a system and that lessons and techniques from the field of complexity science can help inform both scientific study of obesity and effective policies to combat obesity. The article gives an overview of modeling techniques especially well suited to study the rich and complex dynamics of obesity and to inform policy design.
Article
We projected future prevalence and BMI distribution based on national survey data (National Health and Nutrition Examination Study) collected between 1970s and 2004. Future obesity-related health-care costs for adults were estimated using projected prevalence, Census population projections, and published national estimates of per capita excess health-care costs of obesity/overweight. The objective was to illustrate potential burden of obesity prevalence and health-care costs of obesity and overweight in the United States that would occur if current trends continue. Overweight and obesity prevalence have increased steadily among all US population groups, but with notable differences between groups in annual increase rates. The increase (percentage points) in obesity and overweight in adults was faster than in children (0.77 vs. 0.46-0.49), and in women than in men (0.91 vs. 0.65). If these trends continue, by 2030, 86.3% adults will be overweight or obese; and 51.1%, obese. Black women (96.9%) and Mexican-American men (91.1%) would be the most affected. By 2048, all American adults would become overweight or obese, while black women will reach that state by 2034. In children, the prevalence of overweight (BMI >/= 95th percentile, 30%) will nearly double by 2030. Total health-care costs attributable to obesity/overweight would double every decade to 860.7-956.9 billion US dollars by 2030, accounting for 16-18% of total US health-care costs. We continue to move away from the Healthy People 2010 objectives. Timely, dramatic, and effective development and implementation of corrective programs/policies are needed to avoid the otherwise inevitable health and societal consequences implied by our projections .
Article
The emergence of systems biology is bringing forth a new set of challenges for advancing science and technology. Defining ways of studying biological systems on a global level, integrating large and disparate data types, and dealing with the infrastructural changes necessary to carry out systems biology, are just a few of the extraordinary tasks of this growing discipline. Despite these challenges, the impact of systems biology will be far-reaching, and significant progress has already been made. Moving forward, the issue of how to use systems biology to improve the health of individuals must be a priority. It is becoming increasingly apparent that the field of systems biology and one of its important disciplines, proteomics, will have a major role in creating a predictive, preventative, and personalized approach to medicine. In this review, we define systems biology, discuss the current capabilities of proteomics and highlight some of the necessary milestones for moving systems biology and proteomics into mainstream health care.
Article
Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The project currently automatically annotates 10 complete genomes. This makes very large demands on compute resources, due to the vast number of sequence comparisons that need to be executed. To circumvent the financial outlay often associated with classical supercomputing environments, farms of multiple, lower-cost machines have now become the norm and have been deployed successfully with this project. The architecture and design of farms containing hundreds of compute nodes is complex and nontrivial to implement. This study will define and explain some of the essential elements to consider when designing such systems. Server architecture and network infrastructure are discussed with a particular emphasis on solutions that worked and those that did not (often with fairly spectacular consequences). The aim of the study is to give the reader, who may be implementing a large-scale biocompute project, an insight into some of the pitfalls that may be waiting ahead.
Article
This Health Policy Report explores the risks and benefits of health information technology, how policymakers are encouraging and managing its dissemination, and what the future holds for health information technology in U.S. medicine.
Article
GPFS is IBM's parallel, shared-disk file system for cluster computers, available on the RS/6000 SP parallel supercomputer and on Linux clusters. GPFS is used on many of the largest supercomputers in the world. GPFS was built on many of the ideas that were developed in the academic community over the last several years, particularly distributed locking and recovery technology. To date it has been a matter of conjecture how well these ideas scale. We have had the opportunity to test those limits in the context of a product that runs on the largest systems in existence. While in many cases existing ideas scaled well, new approaches were necessary in many key areas. This paper describes GPFS, and discusses how distributed locking and recovery techniques were extended to scale to large clusters.
Article
Emerging high-performance applications require the ability to exploit diverse, geographically distributed resources. These applications use high-speed networks to integrate supercomputers, large databases, archival storage devices, advanced visualization devices, and/or scientific instruments to form networked virtual supercomputers or metacomputers. While the physical infrastructure to build such systems is becoming widespread, the heterogeneous and dynamic nature of the metacomputing environment poses new challenges for developers of system software, parallel tools, and applications. In this article, we introduce Globus, a system that we are developing to address these challenges. The Globus system is intended to achieve a vertically integrated treatment of application, middleware, and network. A low-level toolkit provides basic mechanisms such as communication, authentication, network information, and data access. These mechanisms are used to construct various higher-leve...
Informatics challenges and opportunities for childhood obesity research
  • A K Datta
  • V Jackson
  • J Rimmer
Introduction to HASTAC. Cyberinfrastructure for Humanities, Arts Social Sciences Summer Institute
  • K Franklin
  • A K Datta
Managing large datasets with iRODS-a performance analysis
  • D Hunich
  • R Muller-Pfeerkorn
Escience for molecular-scale simulations and the eminerals project
  • E Salje
  • E Artacho
  • K Austen
  • R Bruin
  • M Calleja
  • H Chappell
  • G T Chiang
  • M Dove
  • I Frame
  • A Goodwin
  • E. Salje
Clinical decision support systems: State of the art. ahrq
  • E S Berner