Figure 1 - uploaded by Bertrand Bellenot
Content may be subject to copyright.
Fraction of storage size RNTuple vs. TTree for identical data content (CMS NanoAOD); lower is better.
Source publication
ROOT is high energy physics' software for storing and mining data in a statistically sound way, to publish results with scientific graphics. It is evolving since 25 years, now providing the storage format for more than one exabyte of data; virtually all high energy physics experiments use ROOT. With another significant increase in the amount of dat...
Contexts in source publication
Context 1
... on potential benefits of a new data layout for HEP, called RNTuple [5], showed improvements to transfer rate, storage size efficiency (see Fig. 1), robustness, and flexibility that are sufficiently significant to warrant the introduction of a new, evolved I/O subsystem for HL-LHC, see the Foundation part of the ROOT input. ROOT caters both to frameworks and analysis physicists. It was thus paramount to make RNTuple work exceptionally well also for analyses. RNTuple is expected ...
Context 2
... RDataFrame, ROOT has mostly addressed the issue of "how to write an analysis accelerated by multithreading". This needs to be extended to a multi-node environment, currently developed as Distributed RDataFrame, see Fig. 10. This will reduce the need for the community to develop adapters for running analyses on clusters such as Spark, for instance by reading ROOT files in Java. It will make university clusters accessible to interactive analysis, significantly reducing the turnaround time for ...
Context 3
... new histogram library will address usability and performance issues, while providing the feature set expected by HEP analyses, see Fig. 11. Many analyses can benefit significantly from direct RNTuple to GPU data transfer, for instance for machine learning, see Fig. 12. This requires work on data layout and GPU-compatible compression algorithms. Inference from machine learning models must be simple to use from RDataFrame; results from RDataFrame must be easily and ...
Context 4
... new histogram library will address usability and performance issues, while providing the feature set expected by HEP analyses, see Fig. 11. Many analyses can benefit significantly from direct RNTuple to GPU data transfer, for instance for machine learning, see Fig. 12. This requires work on data layout and GPU-compatible compression algorithms. Inference from machine learning models must be simple to use from RDataFrame; results from RDataFrame must be easily and efficiently usable as ML training input. RooFit (see Fig. 13) needs to continue its renovation for increased efficiency, for instance by ...
Context 5
... from direct RNTuple to GPU data transfer, for instance for machine learning, see Fig. 12. This requires work on data layout and GPU-compatible compression algorithms. Inference from machine learning models must be simple to use from RDataFrame; results from RDataFrame must be easily and efficiently usable as ML training input. RooFit (see Fig. 13) needs to continue its renovation for increased efficiency, for instance by processing arrays of input data also on GPUs. A significant hurdle of RooFit is the pointer-based interface with implicit ownership rules; a redesign based on value semantics and thus similar for Python and C++ is needed for future evolution of ...
Context 6
... visualization Further investment (see Fig. 14) will ensure smooth transition from the legacy graphics and GUI interfaces to the new architecture independent, web-based graphics and GUI implementations [27]. For this to succeed, the new libraries must provide the minimal feature set needed by analyses, before the legacy libraries will cease to function on commodity analysis systems, ...
Context 7
... on potential benefits of a new data layout for HEP, called RNTuple [5], showed improvements to transfer rate, storage size efficiency (see Fig. 1), robustness, and flexibility that are sufficiently significant to warrant the introduction of a new, evolved I/O subsystem for HL-LHC, see the Foundation part of the ROOT input. ROOT caters both to frameworks and analysis physicists. It was thus paramount to make RNTuple work exceptionally well also for analyses. RNTuple is expected ...
Context 8
... RDataFrame, ROOT has mostly addressed the issue of "how to write an analysis accelerated by multithreading". This needs to be extended to a multi-node environment, currently developed as Distributed RDataFrame, see Fig. 10. This will reduce the need for the community to develop adapters for running analyses on clusters such as Spark, for instance by reading ROOT files in Java. It will make university clusters accessible to interactive analysis, significantly reducing the turnaround time for ...
Context 9
... new histogram library will address usability and performance issues, while providing the feature set expected by HEP analyses, see Fig. 11. Many analyses can benefit significantly from direct RNTuple to GPU data transfer, for instance for machine learning, see Fig. 12. This requires work on data layout and GPU-compatible compression algorithms. Inference from machine learning models must be simple to use from RDataFrame; results from RDataFrame must be easily and ...
Context 10
... new histogram library will address usability and performance issues, while providing the feature set expected by HEP analyses, see Fig. 11. Many analyses can benefit significantly from direct RNTuple to GPU data transfer, for instance for machine learning, see Fig. 12. This requires work on data layout and GPU-compatible compression algorithms. Inference from machine learning models must be simple to use from RDataFrame; results from RDataFrame must be easily and efficiently usable as ML training input. RooFit (see Fig. 13) needs to continue its renovation for increased efficiency, for instance by ...
Context 11
... from direct RNTuple to GPU data transfer, for instance for machine learning, see Fig. 12. This requires work on data layout and GPU-compatible compression algorithms. Inference from machine learning models must be simple to use from RDataFrame; results from RDataFrame must be easily and efficiently usable as ML training input. RooFit (see Fig. 13) needs to continue its renovation for increased efficiency, for instance by processing arrays of input data also on GPUs. A significant hurdle of RooFit is the pointer-based interface with implicit ownership rules; a redesign based on value semantics and thus similar for Python and C++ is needed for future evolution of ...
Context 12
... visualization Further investment (see Fig. 14) will ensure smooth transition from the legacy graphics and GUI interfaces to the new architecture independent, web-based graphics and GUI implementations [27]. For this to succeed, the new libraries must provide the minimal feature set needed by analyses, before the legacy libraries will cease to function on commodity analysis systems, ...