-
K. Hasham,
A. Delgado Peris, A. Anjum,
D. Evans,
S. Gowdy,
J.M. Hernandez,
E. Huedo,
D. Hufnagel,
F. van Lingen,
R. McClatchey,
S. Metson
[show abstract]
[hide abstract]
ABSTRACT: Complex scientific workflows can process large amounts of data using thousands of tasks. The turnaround times of these workflows are often affected by various latencies such as the resource discovery, scheduling and data access latencies for the individual workflow processes or actors. Minimizing these latencies will improve the overall execution time of a workflow and thus lead to a more efficient and robust processing environment. In this paper, we propose a pilot job concept that has intelligent data reuse and job execution strategies to minimize the scheduling, queuing, execution and data access latencies. The results have shown that significant improvements in the overall turnaround time of a workflow can be achieved with this approach. The proposed approach has been evaluated, first using the CMS Tier0 data processing workflow, and then simulating the workflows to evaluate its effectiveness in a controlled environment.
IEEE Transactions on Nuclear Science 07/2011; · 1.45 Impact Factor
-
[show abstract]
[hide abstract]
ABSTRACT: Large-scale neuroscience research projects are necessary in order to make significant progress in the study of degenerative brain diseases. At present the effectiveness of such efforts is being somewhat restricted by the absence of specifically tailored computing infrastructures. The neuGRID project aims to address this through the provision of a high-level service oriented infrastructure that enables complex neuro-science research. One of the principle aims of this work is to develop portable services that can be re-used in a larger set of related medical applications to access distributed computing resources. These services will provide high-level functionality that will support workflow authoring and planning, provenance storage and retrieval, querying against heterogeneous data sources as well as security and data anonymization amongst others. This paper introduces the neuGRID service architecture and outlines the design of two specific services, namely the Pipeline Service and the Glueing Service. A proof of concept implementation to evaluate the neuGRID design approach has been developed.
Computer-Based Medical Systems, 2009. CBMS 2009. 22nd IEEE International Symposium on; 09/2009
-
The CMS Collaboration,
S Chatrchyan,
G Hmayakyan,
V Khachatryan,
A M Sirunyan,
W Adam,
T Bauer,
T Bergauer,
H Bergauer,
M Dragicevic, [......],
G Abdullaeva,
A Avezov,
M I Fazylov,
E M Gasanov,
A Khugaev,
Y N Koblik,
M Nishonov,
K Olimov,
A Umaraliev,
B S Yuldashev
[show abstract]
[hide abstract]
ABSTRACT: The Compact Muon Solenoid (CMS) detector is described. The detector operates at the Large Hadron Collider (LHC) at CERN. It was conceived to study proton-proton (and lead-lead) collisions at a centre-of-mass energy of 14 TeV (5.5 TeV nucleon-nucleon) and at luminosities up to 1034 cm−2 s−1 (1027 cm−2 s−1). At the core of the CMS detector sits a high-magnetic-field and large-bore superconducting solenoid surrounding an all-silicon pixel and strip tracker, a lead-tungstate scintillating-crystals electromagnetic calorimeter, and a brass-scintillator sampling hadron calorimeter. The iron yoke of the flux-return is instrumented with four stations of muon detectors covering most of the 4π solid angle. Forward sampling calorimeters extend the pseudorapidity coverage to high values (|η| ≤ 5) assuring very good hermeticity. The overall dimensions of the CMS detector are a length of 21.6 m, a diameter of 14.6 m and a total weight of 12500 t.
Journal of Instrumentation 08/2008; 3(08):S08004. · 1.87 Impact Factor
-
K Skaburskas,
F Estrella,
J Shade,
D Manset,
J Revillard,
A Rios, A Anjum,
A Branson,
P Bloodsworth,
T Hauer,
R McClatchey,
D Rogulin
[show abstract]
[hide abstract]
ABSTRACT: The Health-e-Child (HeC) project [1], [2] is an EC Framework Programme 6 Integrated Project that aims to develop a grid-based integrated healthcare platform for paediatrics. Using this platform biomedical informaticians will integrate heterogeneous data and perform epidemiological studies across Europe. The resulting Grid enabled biomedical information platform will be supported by robust search, optimization and matching techniques for information collected in hospitals across Europe. In particular, paediatricians will be provided with decision support, knowledge discovery and disease modelling applications that will access data in hospitals in the UK, Italy and France, integrated via the Grid. For economy of scale, reusability, extensibility, and maintainability, HeC is being developed on top of an EGEE/gLite [3] based infrastructure that provides all the common data and computation management services required by the applications. This paper discusses some of the major challenges in bio-medical data integration and indicates how these will be resolved in the HeC system. HeC is presented as an example of how computer science (and, in particular Grid infrastructures) originating from high energy physics can be adapted for use by biomedical informaticians to deliver tangible real-world benefits.
Journal of Physics Conference Series 07/2008; 119(8):082011.
-
[show abstract]
[hide abstract]
ABSTRACT: The concepts, design and evaluation of the Data Intensive and Network Aware (DIANA) meta-scheduling approach for solving the challenges of data analysis being faced by CERN experiments are discussed in this paper. Our results suggest that data analysis can be made robust by employing fault tolerant and decentralized meta-scheduling algorithms supported in our DIANA meta-scheduler. The DIANA meta-scheduler supports data intensive bulk scheduling, is network aware and follows a policy centric meta-scheduling. In this paper, we demonstrate that a decentralized and dynamic meta-scheduling approach is an effective strategy to cope with increasing numbers of users, jobs and datasets. We present 'quality of service' related statistics for physics analysis through the application of a policy centric fair-share scheduling model. The DIANA meta-schedulers create a peer-to-peer hierarchy of schedulers to accomplish resource management that changes with evolving loads and is dynamic and adapts to the volatile nature of the resources.
Journal of Physics Conference Series 07/2008; 119(7):072004.
-
[show abstract]
[hide abstract]
ABSTRACT: Evidence-based medicine is critically dependent on three sources of information: a medical knowledge base, the patient's medical record and knowledge of available resources, including where appropriate, clinical protocols. Patient data is often scattered in a variety of databases and may, in a distributed model, be held across several disparate repositories. Consequently addressing the needs of an evidence- based medicine community presents issues of biomedical data integration, clinical interpretation and knowledge management. This paper outlines how the Health-e-Child project has approached the challenge of requirements specification for (bio-) medical data integration, from the level of cellular data, through disease to that of patient and population. The approach is illuminated through the requirements elicitation and analysis of Juvenile Idiopathic Arthritis (JIA), one of three diseases being studied in the EC-funded Health- e-Child project.
Database Engineering and Applications Symposium, 2007. IDEAS 2007. 11th International; 10/2007
-
[show abstract]
[hide abstract]
ABSTRACT: Discovery Systems (DS) can be considered as entry points for global loosely coupled distributed systems. An efficient Discovery System in essence increases the performance, reliability and decision making capability of distributed systems. With the rapid increase in scale of distributed applications, existing solutions for discovery systems are fast becoming either obsolete or incapable of handling such complexity. They are particularly ineffective when handling service lifetimes and providing up-to-date information, poor at enabling dynamic service access and they can also impose unwanted restrictions on interfaces to widely available information repositories. In this paper we present essential the design characteristics, an implementation and a performance analysis for a discovery system capable of overcoming these deficiencies in large, globally distributed environments.
08/2007;
-
[show abstract]
[hide abstract]
ABSTRACT: This paper presents the design and implementation of a Grid-enabled physics analysis environment for handheld and other resource-limited computing devices as one example of the use of mobile devices in eScience. Handheld devices offer great potential because they provide ubiquitous access to data and round-the-clock connectivity over wireless links. Our solution aims to provide users of handheld devices the capability to launch heavy computational tasks on computational and data Grids, monitor the jobs status during execution, and retrieve results after job completion. Users carry their jobs on their handheld devices in the form of executables (and associated libraries). Users can transparently view the status of their jobs and get back their outputs without having to know where they are being executed. In this way, our system is able to act as a high-throughput computing environment where devices ranging from powerful desktop machines to small handhelds can employ the power of the Grid. The results shown in this paper are readily applicable to the wider eScience community.
08/2007;
-
[show abstract]
[hide abstract]
ABSTRACT: The use of meta-schedulers for resource management in large-scale distributed systems often leads to a hierarchy of schedulers. In this paper, we discuss why existing meta-scheduling hierarchies are sometimes not sufficient for Grid systems due to their inability to re-organise jobs already scheduled locally. Such a job re-organisation is required to adapt to evolving loads which are common in heavily used Grid infrastructures. We propose a peer-topeer scheduling model and evaluate it using case studies and mathematical modelling. We detail the DIANA (Data Intensive and Network Aware) scheduling algorithm and its queue management system for coping with the load distribution and for supporting bulk job scheduling. We demonstrate that such a system is beneficial for dynamic, distributed and self-organizing resource management and can assist in optimizing load or job distribution in complex Grid infrastructures.
e-Science and Grid Computing, 2006. e-Science '06. Second IEEE International Conference on; 01/2007
-
[show abstract]
[hide abstract]
ABSTRACT: Results from the research and development of a Data Intensive and Network Aware (DIANA) scheduling engine, to be used primarily for data intensive sciences such as physics analysis, are described. In Grid analyses, tasks can involve thousands of computing, data handling, and network resources. The central problem in the scheduling of these resources is the coordinated management of computation and data at multiple locations and not just data replication or movement. However, this can prove to be a rather costly operation and efficient scheduling can be a challenge if compute and data resources are mapped without considering network costs. We have implemented an adaptive algorithm within the so-called DIANA Scheduler which takes into account data location and size, network performance and computation capability in order to enable efficient global scheduling. DIANA is a performance-aware and economy-guided Meta Scheduler. It iteratively allocates each job to the site that is most likely to produce the best performance as well as optimizing the global queue for any remaining jobs. Therefore, it is equally suitable whether a single job is being submitted or bulk scheduling is being performed. Results indicate that considerable performance improvements can be gained by adopting the DIANA scheduling approach
IEEE Transactions on Nuclear Science 01/2007; · 1.45 Impact Factor
-
J. Andreeva, A. Anjum,
T. Barrass,
D. Bonacorsi,
J. Bunn,
P. Capiluppi,
M. Corvo,
N. Darmenov,
N. DeFilippis,
F. Donno, [......],
D. Newbold,
A. Pierro,
L. Silvestris,
C. Steenberg,
H. Stockinger,
L. Taylor,
M. Thomas,
L. Tuura,
T. Wildish,
F. VanLingen
[show abstract]
[hide abstract]
ABSTRACT: The CMS experiment is currently developing a computing system capable of serving, processing and archiving the large number of events that will be generated when the CMS detector starts taking data. During 2004 CMS undertook a large scale data challenge to demonstrate the ability of the CMS computing system to cope with a sustained data-taking rate equivalent to 25% of startup rate. Its goals were: to run CMS event reconstruction at CERN for a sustained period at 25 Hz input rate; to distribute the data to several regional centers; and enable data access at those centers for analysis. Grid middleware was utilized to help complete all aspects of the challenge. To continue to provide scalable access from anywhere in the world to the data, CMS is developing a layer of software that uses Grid tools to gain access to data and resources, and that aims to provide physicists with a user friendly interface for submitting their analysis jobs. This paper describes the data challenge experience with Grid infrastructure and the current development of the CMS analysis system.
IEEE Transactions on Nuclear Science 09/2005; · 1.45 Impact Factor
-
J. Bunn,
F. van Lingen,
H Newman,
C. Steenberg,
M Thomas,
A Ali, A. Anjum,
T. Azim,
F Khan,
W. ur Rehman,
R McClatchey,
Jang Uk In
[show abstract]
[hide abstract]
ABSTRACT: High energy physics (HEP) and other scientific communities have adopted service oriented architectures (SOA) as part of a larger grid computing effort. This effort involves the integration of many legacy applications and programming libraries into a SOA framework. The grid analysis environment (GAE) (Lingen et al., 2004) is such a service oriented architecture based on the Clarens grid services framework (Steenberg et al., 2004) and is being developed as part of the compact muon solenoid (CMS) experiment at the large hadron collider (LHC) at European Laboratory for Particle Physics (CERN). Clarens provides a set of authorization, access control, and discovery services, as well as XMLRPC and SOAP access to all deployed services. Two implementations of the Clarens Web services framework (Python and Java) offer integration possibilities for a wide range of programming languages. This paper describes the Java implementation of the Clarens Web services framework called 'JClarens' and several Web services of interest to the scientific and grid community that have been deployed using JClarens.
Web Services, 2005. ICWS 2005. Proceedings. 2005 IEEE International Conference on; 08/2005
-
[show abstract]
[hide abstract]
ABSTRACT: Large scientific collaborations are moving towards service oriented architectures for implementation and deployment of globally distributed systems. Clarens is a high performance, easy to deploy Web service framework that supports the construction of such globally distributed systems. This paper discusses some of the core functionality of Clarens that the authors believe is important for building distributed systems based on Web services that support scientific analysis.
Parallel Processing, 2005. ICPP 2005 Workshops. International Conference Workshops on; 07/2005
-
A Ali, A. Anjum,
Tahir Azim,
J. Bunn,
S Iqbal,
R McClatchey,
H Newman,
S.Y. Shah,
T Solomonides,
C. Steenberg,
M Thomas,
F. van Lingen,
I. Willers
[show abstract]
[hide abstract]
ABSTRACT: Grid based systems require a database access mechanism that can provide seamless homogeneous access to the requested data through a virtual data access system, i.e. a system which can take care of tracking the data that is stored in geographically distributed heterogeneous databases. This system should provide an integrated view of the data that is stored in the different repositories by using a virtual data access mechanism, i.e. a mechanism which can hide the heterogeneity of the backend databases from the client applications. This paper focuses on accessing data stored in disparate relational databases through a Web service interface, and exploits the features of a data warehouse and data marts. We present a middleware that enables applications to access data stored in geographically distributed relational databases without being aware of their physical locations and underlying schema. A Web service interface is provided to enable applications to access this middleware in a language and platform independent way. A prototype implementation was created based on Clarens (C. Steenberg et. al., 2003), Unity (R. Lawrence and K. Barker, 2000) and POOL (http://pool.cern.ch). This ability to access the data stored in the distributed relational databases transparently is likely to be a very powerful one for grid users, especially the scientific community wishing to collate and analyze data distributed over the grid.
Parallel Processing, 2005. ICPP 2005 Workshops. International Conference Workshops on; 07/2005
-
A. Ali, A. Anjum,
T. Azim,
J. Bunn,
A. Mehmood,
R. McClatchey,
H. Newman,
W. ur Rehman,
C. Steenberg,
M. Thomas,
F. van Lingen,
I. Willers,
M.A. Zafar
[show abstract]
[hide abstract]
ABSTRACT: Selecting optimal resources for submitting jobs on a computational grid or accessing data from a data grid is one of the most important tasks of any grid middleware. Most modern grid software today satisfies this responsibility and gives a best-effort performance to solve this problem. Almost all decisions regarding scheduling and data access are made by the software automatically, giving users little or no control over the entire process. To solve this problem, a more interactive set of services and middleware is desired that provides users more information about grid weather, and gives them more control over the decision making process. This paper presents a set of services that have been developed to provide more interactive resource management capabilities within the grid analysis environment (GAE) being developed collaboratively by Caltech, NUST and several other institutes. These include a steering service, a job monitoring service and an estimator service that have been designed and written using a common grid-enabled Web services framework named Clarens. The paper also presents a performance analysis of the developed services to show that they have indeed resulted in a more interactive and powerful system for user-centric grid-enabled physics analysis.
Parallel Processing, 2005. ICPP 2005 Workshops. International Conference Workshops on; 07/2005
-
J. Andreeva, A. Anjum,
T. Barrass,
D. Bonacorsi,
J. Bunn,
M. Corvo,
N. Darmenov,
N. De Filippis,
F. Donno,
G. Donvito, [......],
H. Newman,
A. Pierro,
L. Silvestris,
C. Steenberg,
H. Stockinger,
L. Taylor,
M. Thomas,
L. Tuura,
T. Wildish,
F. Van Lingen
[show abstract]
[hide abstract]
ABSTRACT: In order to prepare the Physics Technical Design Report, due by end of 2005, the CMS experiment needs to simulate, reconstruct and analyse about 100 million events, corresponding to more than 200 TB of data. The data will be distributed to several Computing Centres. In order to provide access to the whole data sample to all the world-wide dispersed physicists, CMS is developing a layer of software that uses the Grid tools provided by the LCG project to gain access to data and resources and that aims to provide a user friendly interface to the physicists submitting the analysis jobs. To achieve these aims CMS will use Grid tools from both the LCG-2 release and those being developed in the framework of the ARDA project. This work describes the current status and the future developments of the CMS analysis system.
Nuclear Science Symposium Conference Record, 2004 IEEE; 11/2004
-
[show abstract]
[hide abstract]
ABSTRACT: In this paper we describe JClarens; a Java based implementation of the Clarens remote data server. JClarens provides Web services for an interactive analysis environment to dynamically access and analyze the tremendous amount of data scattered across various locations. Additionally this research is aimed to develop a service oriented grid enabled portal (GEP) that provides interface and access to several grid services to give a homogeneous and optimized view of the distributed and heterogeneous environment. Other than showing platform independent behavior provided by Java, the use of XML-RPC based Web services enabled JClarens to be a language neutral server and demonstrated interoperability with its Python variant. Extreme care has been taken in the usage and manipulation of various Java libraries to cater the needs of high performance computing. The overall exercise has yielded in a prototype with strong emphasis on security and virtual organization management (VOM). This shall provide a common platform to support development of larger, more flexible framework with future aims to integrate it with a loosely coupled, decentralized, and autonomous framework for grid enabled analysis environment (GAE).
Web Services, 2004. Proceedings. IEEE International Conference on; 08/2004
-
[show abstract]
[hide abstract]
ABSTRACT: The requirement for information on portable, handheld devices demands the realization of increasingly complex applications for increasingly small and ubiquitous devices. This trend promotes the migration of technologies that were originally developed for desktop computers to handheld devices. With the onset of grid computing, users of handheld devices should be able to accomplish much more complex tasks, by accessing the processing and storage resources of the grid. This paper describes the development, features, and performance aspects of a grid enabled analysis environment designed for handheld devices. We also describe some differences in the technologies required to run these applications on desktop machines and handheld devices. In addition, we propose a prototype agent-based distributed architecture for carrying out high-speed analysis of physics data on handheld devices.
Networking and Communication Conference, 2004. INCC 2004. International; 07/2004
-
S. Chatrchyan,
G Hmayakyan,
V Khachatryan,
AM Sirunyan,
W Adam,
T Bauer,
T Bergauer,
H Bergauer,
M Dragicevic,
J. Ero, [......],
M. Baarmand,
L Baksay,
S Guragain,
M. Hohlmann,
H Mermerkaya,
R Ralich,
I Vodopiyanov,
MR Adams,
IM Anghel,
L Apanasevich
-
A Fanfani,
J Andreeva, A. Anjum,
T. Barrass,
D Bonacorsi,
J. Bunn,
M. Corvo,
N Darmenov,
N De Filippis,
F. Donno, [......],
S Metson,
H Newman,
L Silvestris,
C. Steenberg,
H. Stockinger,
L Taylor,
M Thomas,
L. Tuura,
F. van Lingen,
T Wildish
[show abstract]
[hide abstract]
ABSTRACT: In order to prepare the Physic Technical Design Report, due by end of 2005, the CMS experiment needs to simulate, reconstruct and anlayse about 100 million events, corresponding to more than 200 TB of data. The data will be distributed to several Computing Centres. In order to provide access to the whole data sample to all the world-wide dispersed physicists, CMS is developing a layer of software that uses the grid tools provided by the LCG project to gain access to data and resources and that aims to provide physicists with a user friendly interface for submitting analysis jobs. The GRID tools used are both those already available in the LCG-2 release and those being developed in gain access to data and resources and that aims to provide physicists with a user friendly interface for submitting analysis jobs. The GRID tools used are both those already available in the LCG-2 release and those being developed in the framework of the ARDA project. This work describes the current status and the future developments of the CMS analysis system.