Shreyas Cholia's research while affiliated with Lawrence Berkeley National Laboratory and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (22)
A fast, robust pipeline for strain mapping of crystalline materials is important for many technological applications. Scanning electron nanodiffraction allows us to calculate strain maps with high accuracy and spatial resolutions, but this technique is limited when the electron beam undergoes multiple scattering. Deep-learning methods have the pote...
Research can be more transparent and collaborative by using Findable, Accessible, Interoperable, and Reusable (FAIR) principles to publish Earth and environmental science data. Reporting formats—instructions, templates, and tools for consistently formatting data within a discipline—can help make data more accessible and reusable. However, the immen...
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (...
Scientific communities are increasingly publishing data to evaluate, accredit, and build on published research. However, guidelines for curating data for publication are sparse for model-related research, limiting the usability of archived simulation data. In particular, there are no established guidelines for archiving data related to terrestrial...
Implementation of a fast, robust, and fully-automated pipeline for crystal structure determination and underlying strain mapping for crystalline materials is important for many technological applications. Scanning electron nanodiffraction offers a procedure for identifying and collecting strain maps with good accuracy and high spatial resolutions....
Data standardization combined with descriptive metadata facilitate data reuse, which is the ultimate goal of the Findable, Accessible, Interoperable, and Reusable (FAIR) principles. Community data or metadata standards are increasingly created through an approach that emphasizes collaboration between various stakeholders. Such an approach requires...
Rich user interfaces like Jupyter have the potential to make interacting with a supercomputer easier and more productive, consequently attracting new kinds of users and helping to expand the application of supercomputing to new science domains. For the scientist user, the ideal rich user interface delivers a familiar, responsive, introspective, mod...
Rich user interfaces like Jupyter have the potential to make interacting with a supercomputer easier and more productive, consequently attracting new kinds of users and helping to expand the application of supercomputing to new science domains. For the scientist-user, the ideal rich user interface delivers a familiar, responsive, introspective, mod...
Researchers in the Department of Energy’s ESS program use a variety of models to advance robust, scale-aware predictions of terrestrial and subsurface ecosystems. ESS projects typically conduct field observations and experiments coupled with modeling exercises using a model-experimental (ModEx) approach that enables iterative co-development of expe...
Large scale experimental science workflows require support for a unified, interactive, real-time platform that can manage a distributed set of resources connected to High Performance Computing (HPC) systems. What is needed is a tool that provides the ease-of-use and interactivity of a web science gateway, while providing the scientist the ability t...
There is a growing research interest in understanding extreme weather in the context of anthropogenic climate change, posing a requirement for new tailored climate data products. Here we introduce the Climate of the 20th Century Plus Detection and Attribution project (C20C + D&A), an international collaboration generating a product specifically int...
A new digital archive enables community use of terrestrial and subsurface ecosystem data sets.
Deep learning researchers are increasingly using Jupyter notebooks to implement interactive, reproducible workflows with embedded visualization, steering and documentation. Such solutions are typically deployed on small-scale (e.g. single server) computing systems. However, as the sizes and complexities of datasets and associated neural network mod...
This paper presents two contributions for research into better understanding the role of anthropogenic warming in extreme weather. The first contribution is the generation of a large number of multi-decadal simulations using a medium-resolution atmospheric climate model, CAM5.1-1degree, under two scenarios of historical climate following the protoc...
This work discusses how the MPContribs framework in the Materials Project (MP) allows user-contributed data to be shown and analyzed alongside the core MP database. The MP is a searchable database of electronic structure properties of over 65,000 bulk solid materials, which is accessible through a web-based science-gateway. We describe the motivati...
PDACS (Portal for Data Analysis Services for Cosmological Simulations) is a Web-based analysis portal that provides access to large simulations and large-scale parallel analysis tools to the research community. It provides opportunities to access, transfer, manipulate, search, and record simulation data, as well as to contribute applications and ca...
Mass spectrometry imaging (MSI) enables researchers to directly probe endogenous molecules directly within the architecture of the biological matrix. Unfortunately, efficient access, management, and analysis of the data generated by MSI approaches remain major challenges to this rapidly developing field. Despite the availability of numerous dedicat...
Citations
... Minor et al. 76 introduce optical aberrations such as nonuniform brightness, lowfrequency noise, and Gaussian blurring into their simulations of topological defects in liquid-crystal systems. Munshi et al. 77 augmented their converged beam electron diffraction data with the addition of elliptic distortions, random translations of the pattern, incoherent plasmonic background noise, and Poisson shot noise. ...
... Consider the smallest amount of information needed for data to be reusable, the 'Minimum information standard' (see Wikipedia for examples of these minimum standards). An example is the work by ESS-DIVE in establishing a community-centric metadata reporting format [2]. ...
... Such applications have motivated the development of specialized interfaces for remote job submission [63,64] and for managing workloads across systems [65,66,67,68]. The LBNL superfacility project has studied requirements for linking instruments with HPC [69] and proposed an OAuth-based asynchronous API [70] that is similar to our action provider interface. DataFed [71] federates various scientific data stores. ...
... For example, there is no storage limit at the OSF repository and a 50 GB limit per dataset at Zenodo [22]. In rare cases in which datasets exceed these limitations, dataset managers can bundle smaller datasets for easier upload and downstream data reuse [41]. For example, The Climate Modelling Intercomparison Project (CMIP6) data and associated model runs contain petabytes of data, but are divided into smaller 'file sets' for more efficient storage and download [42]. ...
... IRT usually consists of greeting contributors, explaining the project guidelines, and collecting relevant information [6]. Although IRTs were introduced on GitHub in 2016 [7] and developers positively rated the usefulness of IRTs on issue reporting [6], [8], [9], they are rarely utilized, according to our analysis. ...
... As a result, extensive research [1,2] has proposed semi-automation approaches to analyze these datasets, these deterministic methods rely on classical computer vision techniques (e.g., Hough transform, Fourier filtering, segmentation and cross-correlation for similarity measure), which typically require manual hyperparameter tunning and a computation cost for each experiment. Deep neural networks (DNNs) have shown superior performance compared to classical computer vision techniques in most benchmark tasks [10], this led to the emergence of fully automated approaches [4,5] and tools [6,7] for various TEM tasks. In the context of orientation microscopy, ML-based approaches are still falling behind traditional techniques such as template matching [1] or Kikuchi technique [8] when it comes to generalization performance to unseen orientations and phases during training, this is due mainly to the limited amount of experimental data about the studied phenomena for training the models, it is a realistic and practical constraint specially for narrow domain applications where real data is not widely available. ...
... 129,138 Remote execution has been integrated with Jupyter notebooks. [139][140][141] The ability to compute anywhere enables users to leverage specialized computing resources designed for low-cost, distributed, and edge computing. 142 AI systems deployed at experimental facilities support rapid data filtering at the edge. ...
... There is an increasing interest in scientific workflows that incorporate remotely controlled, automated experiments over collections of physical instruments and computing systems. Often, these resources are located at geographically dispersed sites, and they need to be federated over wide-area networks to form ecosystems that seamlessly support these workflows [1], [2]. Recent workflow developments enable the use of Artificial Intelligence (AI) codes both as a part of scientific computations and data analyses, and for orchestrating automated experiments at potentially remote physical instruments. ...
... Jupyter has been used in ML research since its inception as a front-end interactive environment for remote CI. Most highperformance computing (HPC) platforms target Jupyter as their current or next-generation front-end user interface (Thomas and Cholia, 2021). The ML and geospatial data libraries introduced in the previous section can be easily used in Jupyter. ...
... We worked with science teams, including NCEM, ALS, and LCLS data-processing pipelines, to capture common Jupyter workflow patterns that can be deployed in a repeatable manner across multiple projects. We describe a case study from NCEM here as an example [5]. ...
Reference: The LBNL Superfacility Project Report