Publications (6)0 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: While the LHC data movement systems have demonstrated the ability to move data at the necessary throughput, we have identified two weaknesses: the latency for physicists to access data and the complexity of the tools involved. To address these, both ATLAS and CMS have begun to federate regional storage systems using Xrootd. Xrootd, referring to a protocol and implementation, allows us to provide data access to all disk-resident data from a single virtual endpoint. This “redirector” discovers the actual location of the data and redirects the client to the appropriate site. The approach is particularly advantageous since typically the redirection requires much less than 500 milliseconds and the Xrootd client is conveniently built into LHC physicists’ analysis tools. Currently, there are three regional storage federations - a US ATLAS region, a European CMS region, and a US CMS region. The US ATLAS and US CMS regions include their respective Tier 1, Tier 2 and some Tier 3 facilities; a large percentage of experimental data is available via the federation. Additionally, US ATLAS has begun studying low-latency regional federations of close-by sites. From the base idea of federating storage behind an endpoint, the implementations and use cases diverge. The CMS software framework is capable of efficiently processing data over high-latency links, so using the remote site directly is comparable to accessing local data. The ATLAS processing model allows a broad spectrum of user applications with varying degrees of performance with regard to latency; a particular focus has been optimizing n-tuple analysis. Both VOs use GSI security. ATLAS has developed a mapping of VOMS roles to specific file system authorizations, while CMS has developed callouts to the site's mapping service. Each federation presents a global namespace to users. For ATLAS, the global-to-local mapping is based on a heuristic-based lookup from the site's local file catalog, while CMS does the mapping based on translations given in a configuration file. We will also cover the latest usage statistics and interesting use cases that have developed over the previous 18 months.
    Journal of Physics Conference Series 12/2012; 396(4):2009-. DOI:10.1088/1742-6596/396/4/042009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Grid2003 Project has deployed a multivirtual organization, application-driven grid laboratory ("Grid3") that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN (ATLAS and CMS), the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling. The deployed infrastructure has been operating since November 2003 with 27 sites, a peak of 2800 processors, work loads from 10 different applications exceeding 1300 simultaneous jobs, and data transfers among sites of greater than 2 TB/day. We describe the principles that have guided the development of this unique infrastructure and the practical experiences that have resulted from its creation and use. We discuss application requirements for grid services deployment and configuration, monitoring infrastructure, application performance, metrics, and operational experiences. We also summarize lessons learned.
    High performance Distributed Computing, 2004. Proceedings. 13th IEEE International Symposium on; 07/2004
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: We describe the design and operational experience of the ATLAS production system as implemented for execution on Grid3 resources. The execution environment consisted of a number of grid-based tools: Pacman for installation of VDT-based Grid3 services and ATLAS software releases, the Capone execution service built from the Chimera/Pegasus virtual data system for directed acyclic graph (DAG) generation, DAGMan/Condor-G for job submission and management, and the Windmill production supervisor which provides the Jabber messaging system for Capone interactions with the ATLAS production database at CERN. Datasets produced on Grid3 were registered into a distributed replica location service (Globus RLS) that was integrated with the Don Quijote proxy service for interoperability with other Grids used by ATLAS. We discuss performance, scalability, and fault handling during the first phase of ATLAS Data Challenge 2.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The ATLAS Collaboration at CERN is preparing for the data taking and analysis at the LHC that will start in 2007. Therefore, a series of Data Challenges was started in 2002 whose goals are the validation of the Computing Model, of the complete software suite, of the data model, and to ensure the correctness of the technical choices to be made for the final offline computing environment. A major feature of the first Data Challenge (DC1) was the preparation and the deployment of the software required for the production of large event samples as a worldwide distributed activity. It should be noted that it was not an option to "run the complete production at CERN" even if we had wanted to; the resources were not available at CERN to carry out the production on a reasonable time-scale. The great challenge of organising and carrying out this large-scale production at a significant number of sites around the world had therefore to be faced. However, the benefits of this are manifold: apart from realising the required computing resources, this exercise created worldwide momentum for ATLAS computing as a whole. This report describes in detail the main steps carried out in DC1 and what has been learned form them as a step towards a computing Grid for the LHC experiments.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The ATLAS Collaboration at CERN is preparing for the data taking and analysis at the LHC that will start in 2007. Therefore, a series of Data Challenges was started in 2002 whose goals are the validation of the Computing Model, of the complete software suite, of the data model, and to ensure the correctness of the technical choices to be made for the final offline computing environment. A major feature of the first Data Challenge (DC1) was the preparation and the deployment of the software required for the production of large event samples as a worldwide distributed activity. It should be noted that it was not an option to "run the complete production at CERN" even if we had wanted to; the resources were not available at CERN to carry out the production on a reasonable time-scale. The great challenge of organising and carrying out this large-scale production at a significant number of sites around the world had therefore to be faced. However, the benefits of this are manifold: apart from realising the required computing resources, this exercise created worldwide momentum for ATLAS computing as a whole. This report describes in detail the main steps carried out in DC1 and what has been learned form them as a step towards a computing Grid for the LHC experiments.
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The Grid2003 Project has deployed a multi-virtual organization, application-driven grid laboratory ("Grid3") that has sustained for several months the production-level services required by physics experiments of the Large Hadron Collider at CERN (ATLAS and CMS), the Sloan Digital Sky Survey project, the gravitational wave search experiment LIGO, the BTeV experiment at Fermilab, as well as applications in molecular structure analysis and genome analysis, and computer science research projects in such areas as job and data scheduling. The deployed infrastructure has been operating since November 2003 with 27 sites, a peak of 2800 processors, work loads from 10 different applications exceeding 1300 simultaneous jobs, and data transfers among sites of greater than 2 TB/day. We describe the principles that have guided the development of this unique infrastructure and the practical experiences that have resulted from its creation and use. We discuss application requirements for grid services deployment and configuration, monitoring infrastructure, application performance, metrics, and operational experiences. We also summarize lessons learned.