Enric Tejedor's research while affiliated with CERN and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (15)
In recent years, RDataFrame, ROOT’s high-level interface for data analysis and processing, has seen widespread adoption on the part of HEP physicists. Much of this success is due to RDataFrame’s ergonomic programming model that enables the implementation of common analysis tasks more easily than previous APIs, without compromising on application pe...
ROOT is high energy physics' software for storing and mining data in a statistically sound way, to publish results with scientific graphics. It is evolving since 25 years, now providing the storage format for more than one exabyte of data; virtually all high energy physics experiments use ROOT. With another significant increase in the amount of dat...
This document discusses the state, roadmap, and risks of the foundational components of ROOT with respect to the experiments at the HL-LHC (Run 4 and beyond). As foundational components, the document considers in particular the ROOT input/output (I/O) subsystem. The current HEP I/O is based on the TFile container file format and the TTree binary ev...
Python is nowadays one of the most widely-used languages for data science. Its rich ecosystem of libraries together with its simplicity and readability are behind its popularity. HEP is also embracing that trend, often using Python as an interface language to access C++ libraries for the sake of performance. PyROOT, the Python bindings of the ROOT...
The High-Energy Physics community faces new data processing challenges caused by the expected growth of data resulting from the upgrade of LHC accelerator. These challenges drive the demand for exploring new approaches for data analysis. In this paper, we present a new declarative programming model extending the popular ROOT data analysis framework...
SWAN (Service for Web-based ANalysis) is a CERN service that allows users to perform interactive data analysis in the cloud, in a “software as a service” model. It is built upon the widely-used Jupyter notebooks, allowing users to write - and run - their data analysis using only a web browser. By connecting to SWAN, users have immediate access to s...
This talk is about sharing our recent experiences in providing data analytics platform based on Apache Spark for High Energy Physics, CERN accelerator logging system and infrastructure monitoring. The Hadoop Service has started to expand its user base for researchers who want to perform analysis with big data technologies. Among many frameworks, Ap...
In the coming years, HEP data processing will need to exploit parallelism on present and future hardware resources to sustain the bandwidth requirements. As one of the cornerstones of the HEP software ecosystem, ROOT embraced an ambitious parallelisation plan which delivered compelling results. In this contribution the strategy is characterised as...
The Physics programmes of LHC Run III and HL-LHC challenge the HEP community. The volume of data to be handled is unprecedented at every step of the data processing chain: analysis is no exception. Physicists must be provided with first-class analysis tools which are easy to use, exploit bleeding edge hardware technologies and allow to seamlessly e...
The bright future of particle physics at the Energy and Intensity frontiers poses exciting challenges to the scientific software community. The traditional strategies for processing and analysing data are evolving in order to (i) offer higher-level programming model approaches and (ii) exploit parallelism to cope with the ever increasing complexity...
When processing large amounts of data, the rate at which reading and writing can take place is a critical factor. High energy physics data processing relying on ROOT is no exception. The recent parallelisation of LHC experiments' software frameworks and the analysis of the ever increasing amount of collision data collected by experiments further em...
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments.
The incarnations of the aforementioned parallelism are multi-threading, multi...
SWAN (Service for Web based ANalysis) is a platform to perform interactive data analysis in the cloud. SWAN allows users to write and run their data analyses with only a web browser, leveraging on the widely-adopted Jupyter notebook interface. The user code, executions and data live entirely in the cloud. SWAN makes it easier to produce and share r...
Data Mining as a Service (DMaaS) is a software and computing infrastructure that allows interactive mining of scientific data in the cloud. It allows users to run advanced data analyses by leveraging the widely adopted Jupyter notebook interface. Furthermore, the system makes it easier to share results and scientific code, access scientific softwar...
Citations
... Python is increasingly relevant for the HEP community, and particularly so in the area of analysis. RDataFrame, like all of ROOT, offers dynamically generated Python bindings via PyROOT ( [10]), while its internal event loop remains in C++: this offers greater opportunities for performance optimizations and allows multi-thread parallelism that would be more complicated in Python due to its Global Interpreter Lock (the GIL). At the same time, the Python ecosystem has developed tools to generate efficient machine code from Python code to speed up numerical applications; the most interesting such tool for our purposes is Numba ( [11]), which is able to compile Python functions (that restrict themselves to use a specific subset of the language) to optimized machine code. ...
Reference: RDataFrame enhancements for HEP analyses
... Awkward RDataFrame [13] uses the C++ header-only libraries to simplify the process of justin-time (JIT) compilation in ROOT [14]. The ak.from_rdataframe [15] function converts the selected ROOT RDataFrame [16] columns as native Awkward Arrays. The templated headeronly implementation constructs the Form from the primitive data types [17]. ...
Reference: The Awkward World of Python and C++
... SeQuiLa has been successfully deployed to both popular managed Hadoop services like Google Dataproc (utilized also in Krissaane et al., 2020) and managed Kubernetes services like Google Kubernetes Engine (GKE), Azure Kubernetes Service, or Amazon Elastic Kubernetes Service. Figure 3 presents an exemplary setup on GKE using the spark-on-k8s-operator and SeQuiLa application defined as a Kubernetes Custom Resource Definition. This architecture was suggested in Castro et al. (2019) as a preferred Apache Spark deployment scenario for scaling data analytics workloads and enabling efficient, on-demand utilization of resources in the cloud infrastructure. More detailed information on setup and corresponding Terraform modules can be found in the dedicated GitHub repository (https://github.com/biodatageeks/sequila-cloudrecipes). ...
... A later study overcame this issue, but did not achieve higher scaling when using available CERN storage facilities [29]. An example of good scalability was provided by researchers of the TOTEM experiment at CERN, with a first approach at distributing a ROOT application over Spark resources in a cloud [30]. The presence of Spark in the HEP community has become relevant enough that CERN has invested in specific infrastructure to support Spark analysis workflows [31]. ...
... With this setup, TOTEM scientists validated the obtained physics results and demonstrated that it is possible to reduce the time required for the final steps of their analysis by a factor 280x when compared to single core execution (i.e., from 8 hours to less then 2 minutes). We invite the interested reader to refer to [14] for further details. ...
... The RNTuple classes make use of templates, such that for simple types (e.g., vectors of floats) that are known at compile time, the compiler can inline a fast path from the highest to the lowest layer without additional value copies or virtual calls. The event iteration layer provides the user-facing interfaces to read and write events, either through RDataFrame [4] or as hand-written event loops. The user interface is presented in more detail in Section 3. ...
Reference: Evolution of the ROOT Tree I/O
... When processing large volumes, sometimes the time spent on reading or writing data comes to the fore. New possibilities for parallelizing these processes are actively used in the optimization of HEP software [5][6][7]. ...
... Such classes are currently available for TMVA [11] (through the TMVA::Experimental::RReader class included in ROOT), lwtnn [12], Tensorflow [9] (through the C API), PyTorch [13] (using TorchScript), and ONNX-Runtime [14]. Most of these support multiple input nodes, with potentially different types, and are thread-safe, such that implicit multithreading [15] can be used. The performance may in the future be improved by evaluating on a batch instead of a single example at a time. ...
... The containerization ecosystem has become so matured, presenting a whole lot of its orchestrators such as Docker Swan, Kubernetes (k8s), Marathon, and Amazon container engine. Google Container Engine (GKE), and Azure container service, Kata [7], Piparo et al. [8], Sanchez [9], Hoque et al. [10] and Augustyn and Warchal [11]. The test bed experiments in the paper made use of Kata-containers. ...