Axel Naumann’s research while affiliated with CERN and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (33)


ROOT’s RNTuple I/O Subsystem: The Path to Production
  • Article
  • Full-text available

May 2024

·

18 Reads

The European Physical Journal Conferences

·

Philippe Canal

·

·

[...]

·

Vincenzo Eduardo Padulano

The RNTuple I/O subsystem is ROOT’s future event data file format and access API. It is driven by the expected data volume increase at upcoming HEP experiments, e.g. at the HL-LHC, and recent opportunities in the storage hardware and software landscape such as NVMe drives and distributed object stores. RNTuple is a redesign of the TTree binary format and API and has shown to deliver substantially faster data throughput and better data compression both compared to TTree and to industry standard formats. In order to let HENP computing workflows benefit from RNTuple’s superior performance, however, the I/O stack needs to connect efficiently to the rest of the ecosystem, from grid storage to (distributed) analysis frameworks to (multithreaded) experiment frameworks for reconstruction and ntuple derivation. With the RNTuple binary format soon arriving at its first production release, we present RNTuple’s feature set, integration efforts, and its performance impact on the time-to-solution. We show the latest performance figures of RDataFrame analysis code of realistic complexity, comparing RNTuple and TTree as data sources. We discuss RNTuple’s approach to functionality critical to the HENP I/O (such as multithreaded writes, fast data merging, schema evolution) and we provide an outlook on the road to its use in production.

Download

I/O performance studies of analysis workloads on production and dedicated resources at CERN

May 2024

·

23 Reads

The European Physical Journal Conferences

The recent evolutions of the analysis frameworks and physics data formats of the LHC experiments provide the opportunity of using central analysis facilities with a strong focus on interactivity and short turnaround times, to complement the more common distributed analysis on the Grid. In order to plan for such facilities, it is essential to know in detail the performance of the combination of a given analysis framework, of a specific analysis and of the installed computing and storage resources. This contribution describes performance studies performed at CERN, using the EOS disk-based storage, either directly or through an XCache instance, from both batch resources and highperformance compute nodes which could be used to build an analysis facility. A variety of benchmarks, both synthetic and based on real-world physics analyses and their corresponding input datasets, are utilized. In particular, the RNTuple format from the ROOT project is put to the test and compared to the latest version of the TTree format, and the impact of caches is assessed. In addition, we assessed the difference in performance between the use of storage system specific protocols, like XRootd, and FUSE. The results of this study are intended to be a valuable input in the design of analysis facilities, at CERN and elsewhere.


Boosting RDataFrame performance with transparent bulk event processing

May 2024

·

10 Reads

The European Physical Journal Conferences

RDataFrame is ROOT’s high-level interface for Python and C++ data analysis. Since it first became available, RDataFrame adoption has grown steadily and it is now poised to be a major component of analysis software pipelines for LHC Run 3 and beyond. Thanks to its design inspired by declarative programming principles, RDataFrame enables the development of highperformance, highly parallel analyses without requiring expert knowledge of multi-threading and I/O: user logic is expressed in terms of self-contained, small computation kernels tied together by a high-level API. This design completely decouples analysis logic from its actual execution, and opens several interesting avenues for workflow optimization. In particular, in this work we explore the benefits of moving internal data processing from an event-by-event to a bulkby-bulk loop. This refactoring dramatically reduces the framework’s runtime overheads; in collaboration with the I/O layer it improves data access patterns; it exposes information that optimizing compilers might use to auto-vectorize the invocation of user-defined computations; finally, while existing user-facing interfaces remain unaffected, it becomes possible to additionally offer interfaces that explicitly expose bulks of events, useful e.g. for the injection of GPU kernels into the analysis workflow. In order to inform similar future R&D, design challenges will be presented, as well as an investigation of the relevant timememory trade-off backed by novel performance benchmarks.


Figure 2. Runtimes for RDataFrame event loops for three different scenarios: A. 100M events, 1 histogram produced, 2 variations (nominal, up, down histograms filled). B. 10M events, 1 nominal histogram, 100 variations (101 histograms filled). C 100k events, 20 nominal histograms, 100 variations (2k histograms filled).
RDataFrame enhancements for HEP analyses

February 2023

·

52 Reads

·

2 Citations

Journal of Physics Conference Series

In recent years, RDataFrame, ROOT’s high-level interface for data analysis and processing, has seen widespread adoption on the part of HEP physicists. Much of this success is due to RDataFrame’s ergonomic programming model that enables the implementation of common analysis tasks more easily than previous APIs, without compromising on application performance. Nonetheless, RDataFrame’s interfaces have been further improved by the recent addition of several major HEP-oriented features: in this contribution we will introduce for instance a dedicated syntax to define systematic variations, per-data-sample call-backs useful to define quantities that vary on a per-sample basis, simplifications of collection operations and the injection of just-in-time-compiled Python functions in the optimized C++ event loop.


The Analysis of High-Frequency Finance Data using ROOT

May 2022

·

57 Reads

Journal of Physics Conference Series

High-frequency financial market data is conceptually distinct from high energy physics (HEP) data. Market data is a time series generated by market participants, while HEP data is a set of independent events generated by collisions between particles. However, there are similarities within the data structure and required tools for data analysis, and both fields share a similar set of problems facing the increasing size of data generated. This paper describes some of the core concepts of financial markets, discusses the data similarities and differences with HEP, and provides an implementation to use ROOT, an open-source data analysis framework in HEP, with financial market data. This implementation makes it possible to take advantage of the rich set of features available in ROOT and extends research in finance.


HL-LHC Analysis With ROOT

May 2022

·

196 Reads

ROOT is high energy physics' software for storing and mining data in a statistically sound way, to publish results with scientific graphics. It is evolving since 25 years, now providing the storage format for more than one exabyte of data; virtually all high energy physics experiments use ROOT. With another significant increase in the amount of data to be handled scheduled to arrive in 2027, ROOT is preparing for a massive upgrade of its core ingredients. As part of a review of crucial software for high energy physics, the ROOT team has documented its R&D plans for the coming years.




ROOT for the HL-LHC: data format

April 2022

·

36 Reads

This document discusses the state, roadmap, and risks of the foundational components of ROOT with respect to the experiments at the HL-LHC (Run 4 and beyond). As foundational components, the document considers in particular the ROOT input/output (I/O) subsystem. The current HEP I/O is based on the TFile container file format and the TTree binary event data format. The work going into the new RNTuple event data format aims at superseding TTree, to make RNTuple the production ROOT event data I/O that meets the requirements of Run 4 and beyond.


Figure 4 shows the behaviour of the LOB and trades around the spoofing of the December 2011 contract between 13:03:45 and 13:04:05. The top panel shows the last traded price (blue line) and the occurrence of individual trades (grey lines). The second and third panel visualize the LOB and cumulative trade volume, respectively. The bottom panel shows the number of messages reported by the exchange in the relevant time window and, hence, the amount of time that passes between messages. The second panel in Figure 4 shows that, when the genuine order was added, individual LOB levels contained volumes of between 500 and 2500 contracts. 8 When the spoof order of 3000 contracts was placed, volume on the first ask level increased significantly, as indicated by the bright yellow colour. This increase in volume remained in the LOB during the execution of the genuine order and ended when the spoof order was cancelled. The addition of the spoof order, the execution of the genuine order and the cancellation of the spoof order all occurred within the same second, as indicated by the space between the green vertical lines. The top panel in Figure 4 shows that when the genuine bid order was placed at 129.578 points, the last traded price was also 129.578 points. This illustrates once more that the goal of this spoof may not have been to move the price, but to attract more liquidity, so as to increase the chance of fully executing the genuine bid order of 50 contracts. 9 This will be further explored in Sections 4.2.4 and 4.6. The last traded price remained constant at 129.578 points during all spoofing actions. The cumulative trade volume panel in Figure 4 shows that no trades took place in the time window until the genuine order and spoof order were placed. 10 After the spoof order was placed, a staircase pattern emerged. Our data shows that this was caused by the genuine bid order not being executed at once but being split into smaller executed trades. After the genuine order was fully
Unravelling the JPMorgan Spoofing Case Using Particle Physics Visualization Methods

January 2022

·

202 Reads

·

1 Citation

European Financial Management

On September 29, 2020, JPMorgan was ordered to pay a settlement of $920.2 million for spoofing the metals and Treasury futures markets from 2008 to 2016. We examine these cases using a visualization method developed in particle physics (CERN) and the messages that the exchange receives about market activity rather than time‐based snapshots. This approach allows to examine multiple indicators related to market manipulation and complement existing research methods, thereby enhancing the identification and understanding of, as well as the motivation for, market manipulation. In the JPMorgan cases, we offer an alternative motivation for spoofing than moving the price. This article is protected by copyright. All rights reserved.


Citations (14)


... The code provides extensive support for various ROOT data structures and classes, including TTree, TTreeFormula, Aliases, TFormula and static Root/AliRoot functions. Work is also being done to simplify compatibility with RDataFrame [7,8] and awkward (PyHep) arrays [9]. One can use this framework with various data sources, including PyRoot (AliRoot/O2) data structures. ...

Reference:

RootInteractive tool for multidimensional statistical analysis, machine learning and analytical model validation
RDataFrame enhancements for HEP analyses

Journal of Physics Conference Series

... Intraday trading, which relies heavily on real-time data, is influenced by real-time price movements [38]. Swift decisionmaking based on real-time information is crucial for profiting in intraday trading [39]. Real-time data also plays a pivotal role in algorithmic trading strategies, where trading decisions are made based on real-time data and mathematical algorithms. ...

Using ROOT to analyse High-Frequency Finance Data
  • Citing Conference Paper
  • November 2021

... In the coming years, the data rate is expected to increase further, for example during operation of the High-Luminosity LHC (HL-LHC). In response, the HEP community is developing RNTuple, an evolution of the currently used TTree columnar format [6]. It is designed to make efficient use of modern hardware, but currently lacks support for highly scalable parallel writing. ...

Evolution of the ROOT Tree I/O

The European Physical Journal Conferences

... The resulting technology has the capability to enable substantial economic and operational gains (including speedup) for High Energy and Nuclear Physics data storage/analysis. In our initial studies, a factor of nearly x4 (3.9) compression was achieved with RHIC/STAR data where ROOT compression managed only x1.4 [6]. ...

Extreme Compression for Large Scale Data Store

The European Physical Journal Conferences

... In paper [6], the function templates are instantiated at runtime by providing non-constant expressions to the non-type template parameters and strings from which the type is deduced by conversion to type template parameters. Another paper [7] allows multiple definitions for the same language construct, producing different Abstract Syntax Trees (AST). While LLVM Clang is used to separate the unique AST from the redefined ones, the redefined AST is JIT compiled [7]. ...

Relaxing the one definition rule in interpreted C++

... Thirty Python package developers, maintainers, and power users, all of whom are authors of this report, gathered together in an informal setting to discuss relevant and timely trends in Python, largely targeting end-user analysis. The following topics were central points of discussion: how PyHEP does (or does not) relate to physicists' needs; the Analysis Grand Challenges that run analyses at scale at analysis facilities; leveraging key packages from the ecosystem; the development of statistical packages, models, interfaces, and serialization; workflow management systems; histogramming; and key distributed processing tools like RDataFrame [3], Coffea [4], and Dask [5]. Finally, we brainstormed the organization of future PyHEP.dev ...

RDataFrame: Easy Parallel ROOT Analysis at 100 Threads

The European Physical Journal Conferences

... When code tries to access (e.g. for serialization or interpretation) a templated class, CINT is looking this type up in its collection of dictionaries. We have extended CINT to react to a lacking dictionary for a templated class by generating it: If the member of a class Klass<MyArg> is accessed, and if CINT knows the header files defining Klass and MyArg (e.g. because of #include statements), CINT will tell ROOT to create a dictionary for it using ROOT's automatic library builder ACLiC [5]. ...

Reference:

C++ and Data
The Role of Interpreters in High Performance Computing

... For example: typedef reflexpr() meta_global_scope; typedef reflexpr(int) meta_int; typedef reflexpr(std) meta_std; typedef reflexpr(std::size_t) meta_std_size_t; 8 It's not possible to create a run-time variable of metaobject type. 9 Namespace, typedef, function, parameter, specifier, etc. 10 Which will happen very often in the more complex use cases typedef reflexpr(std::thread) meta_std_thread; typedef reflexpr(std::pair) meta_std_pair; ...

Static reflection (revision 8)
  • Citing Technical Report
  • June 2017

... In P0385R1, Matus Chochlik, Axel Naumann and David Sankel use a reflexpr operator to associate a unique implementationdefined class with each reflected type (fundamental, compound, user-defined), namespace, and specifier (public, virtual, etc.) [133]. A set of queries, in the form of type traits, is used to access the name, members, and other properties of the reflected class. ...

Static reflection: Rationale, design and evolution