NVIDIA
  • Santa Clara, CA, United States
Recent publications
Developing a generalized segmentation model capable of simultaneously delineating multiple organs and diseases is highly desirable. Federated learning (FL) is a key technology enabling the collaborative development of a model without exchanging training data. However, the limited access to fully annotated training data poses a major challenge to training generalizable models. We propose “ConDistFL”, a framework to solve this problem by combining FL with knowledge distillation. Local models can extract the knowledge of unlabeled organs and tumors from partially annotated data from the global model with an adequately designed conditional probability representation. We validate our framework on four distinct partially annotated abdominal CT datasets from the MSD and KiTS19 challenges. The experimental results show that the proposed framework significantly outperforms FedAvg and FedOpt baselines. Moreover, the performance on an external test dataset demonstrates superior generalizability compared to models trained on each dataset separately. Our ablation study suggests that ConDistFL can perform well without frequent aggregation, reducing the communication cost of FL. Our implementation will be available at https://github.com/NVIDIA/NVFlare/tree/main/research/condist-fl.
A data-driven emulator for the baroclinic double gyre ocean simulation is presented in this study. Traditional numerical simulations using partial differential equations (PDEs) often require substantial computational resources, hindering real-time applications and inhibiting model scalability. This study presents a novel approach employing neural operators to address these challenges in an idealized double-gyre ocean simulation. We propose a deep learning approach capable of learning the underlying dynamics of the ocean system, complementing the classical methods, and effectively replacing the need for explicit PDE solvers at inference time. By leveraging neural operators, we efficiently integrate the governing equations, providing a data-driven and computationally efficient framework for simulating the double-gyre ocean circulation. Our approach demonstrates promising results in terms of accuracy and computational efficiency, showcasing the potential for advancing ocean modeling through the fusion of neural operators and traditional oceanographic methodologies. In comparison to a dynamical numerical model, we obtain 600x speedups allowing us to create 2000-day ensembles in tens of seconds instead of hours.
We describe an exciting new application domain for deep reinforcement learning (RL): droplet routing on digital microfluidic biochips (DMFBs). A DMFB consists of a two-dimensional electrode array, and it manipulates droplets of liquid to automatically execute biochemical protocols for clinical chemistry. However, a major problem with DMFBs is that electrodes can degrade over time. The transportation of droplet transportation over these degraded electrodes can fail, thereby adversely impacting the integrity of the bioassay outcome. We demonstrated that the fomulation of droplet transportation as an RL problem enables the training of deep neural network policies that can adapt to the underlying health conditions of electrodes and ensure reliable fluidic operations. We describe an RL-based droplet-routing solution that can be used for various sizes of DMFBs. We highlight the reliable execution of an epigenetic bioassay with the RL droplet router on a fabricated DMFB. We show that the use of the RL approach on a simple micro-computer (Raspberry Pi 4) leads to acceptable performance for time-critical bioassays. We present a simulation environment based on the OpenAI Gym Interface for RL-guided droplet routing problems on DMFBs. We present results on our study of electrode degradation using fabricated DMFBs. The study supports the degradation model used in the simulator.
Deformable linear objects (DLOs), such as rods, cables, and ropes, play important roles in daily life. However, manipulation of DLOs is challenging as large geometrically nonlinear deformations may occur during the manipulation process. This problem is made even more difficult as the different deformation modes (e.g., stretching, bending, and twisting) may result in elastic instabilities during manipulation. In this paper, we formulate a physics-guided data-driven method to solve a challenging manipulation task—accurately deploying a DLO (an elastic rod) onto a rigid substrate along various prescribed patterns. Our framework combines machine learning, scaling analysis, and physical simulations to develop a physics-based neural controller for deployment. We explore the complex interplay between the gravitational and elastic energies of the manipulated DLO and obtain a control method for DLO deployment that is robust against friction and material properties. Out of the numerous geometrical and material properties of the rod and substrate, we show that only three non-dimensional parameters are needed to describe the deployment process with physical analysis. Therefore, the essence of the controlling law for the manipulation task can be constructed with a low-dimensional model, drastically increasing the computation speed. The effectiveness of our optimal control scheme is shown through a comprehensive robotic case study comparing against a heuristic control method for deploying rods for a wide variety of patterns. In addition to this, we also showcase the practicality of our control scheme by having a robot accomplish challenging high-level tasks such as mimicking human handwriting, cable placement, and tying knots.
When robots perform long action sequences, users will want to easily and reliably find out what they have done. We therefore demonstrate the task of learning to summarize and answer questions about a robot agent’s past actions using natural language alone. A single system with a large language model at its core is trained to both summarize and answer questions about action sequences given ego-centric video frames of a virtual robot and a question prompt. To enable training of question answering, we develop a method to automatically generate English-language questions and answers about objects, actions, and the temporal order in which actions occurred during episodes of robot action in the virtual environment. Training one model to both summarize and answer questions enables zero-shot transfer of representations of objects learned through question answering to improved action summarization.
There are enormous enthusiasm and concerns in applying large language models (LLMs) to healthcare. Yet current assumptions are based on general-purpose LLMs such as ChatGPT, which are not developed for medical use. This study develops a generative clinical LLM, GatorTronGPT, using 277 billion words of text including (1) 82 billion words of clinical text from 126 clinical departments and approximately 2 million patients at the University of Florida Health and (2) 195 billion words of diverse general English text. We train GatorTronGPT using a GPT-3 architecture with up to 20 billion parameters and evaluate its utility for biomedical natural language processing (NLP) and healthcare text generation. GatorTronGPT improves biomedical natural language processing. We apply GatorTronGPT to generate 20 billion words of synthetic text. Synthetic NLP models trained using synthetic text generated by GatorTronGPT outperform models trained using real-world clinical text. Physicians’ Turing test using 1 (worst) to 9 (best) scale shows that there are no significant differences in linguistic readability ( p = 0.22; 6.57 of GatorTronGPT compared with 6.93 of human) and clinical relevance ( p = 0.91; 7.0 of GatorTronGPT compared with 6.97 of human) and that physicians cannot differentiate them ( p < 0.001). This study provides insights into the opportunities and challenges of LLMs for medical research and healthcare.
Dental caries on the crown’s surface is caused by the interaction of bacteria and carbohydrates, which then gradually alter the tooth’s structure. In addition, calculus is the root of periodontal disease. Optical coherence tomography (OCT) has been considered to be a promising tool for identifying dental caries; however, diagnosing dental caries in the early stage still remains challenging. In this study, we proposed an ultrahigh-resolution OCT (UHR-OCT) system with axial and transverse resolutions of 2.6 and 1.8 μm for differentiating the early-stage dental caries and calculus. The same teeth were also scanned by a conventional spectral-domain OCT (SD-OCT) system with an axial resolution of 7 μm. The results indicated that early-stage carious structures such as small cavities can be observed using UHR-OCT; however, the SD-OCT system with a lower resolution had difficulty identifying it. Moreover, the estimated surface roughness and the scattering coefficient of enamel were proposed for quantitatively differentiating the different stages of caries. Furthermore, the thickness of the calculus can be estimated from the UHR-OCT results. The results have demonstrated that UHR-OCT can detect caries and calculus in their early stages, showing that the proposed method for the quantitative evaluation of caries and calculus is potentially promising.
The integration of Augmented Reality (AR) into daily surgical practice is withheld by the correct registration of pre-operative data. This includes intelligent 3D model superposition whilst simultaneously handling real and virtual occlusions caused by the AR overlay. Occlusions can negatively impact surgical safety and as such deteriorate rather than improve surgical care. Robotic surgery is particularly suited to tackle these integration challenges in a stepwise approach as the robotic console allows for different inputs to be displayed in parallel to the surgeon. Nevertheless, real-time de-occlusion requires extensive computational resources which further complicates clinical integration. This work tackles the problem of instrument occlusion and presents, to our best knowledge, the first-in-human on edge deployment of a real-time binary segmentation pipeline during three robot-assisted surgeries: partial nephrectomy, migrated endovascular stent removal and liver metastasectomy. To this end, a state-of-the-art real-time segmentation and 3D model pipeline was implemented and presented to the surgeon during live surgery. The pipeline allows real-time binary segmentation of 37 non-organic surgical items, which are never occluded during AR. The application features real-time manual 3D model manipulation for correct soft tissue alignment. The proposed pipeline can contribute towards surgical safety, ergonomics and acceptance of AR in minimally invasive surgery.
The growth in AI is rapidly transforming the structure of economic production. However, very little is known about how within-AI specialization may relate to broad-based economic diversification. This paper provides a data-driven framework to integrate the interconnection between AI-based specialization with goods and services export specialization to help design future comparative advantage based on the inherent capabilities of nations. Using detailed data on private investment in AI and export specialization for more than 80 countries, we propose a systematic framework to help identify the connection from AI to goods and service sector specialization. The results are instructive for nations that aim to harness AI specialization to help guide sources of future competitive advantage. The operational framework could help inform the public and private sectors to uncover connections with nearby areas of specialization.
Using machine learning (ML) for the online correction of coarse‐resolution atmospheric models has proven effective in reducing biases in near‐surface temperature and precipitation rate. However, ML corrections often introduce new biases in the upper atmosphere and causes inconsistent model performance across different random seeds. Furthermore, they produce profiles that are outside the distribution of samples used in training, which can interfere with the baseline physics of the atmospheric model and reduce model reliability. This study introduces the use of a novelty detector to mask ML corrections when the atmospheric state is deemed out‐of‐sample. The novelty detector is trained on profiles of temperature and specific humidity in a semi‐supervised fashion using samples from the coarsened reference fine‐resolution simulation. The novelty detector responds to particularly biased simulations relative to the reference simulation by categorizing more columns as out‐of‐sample. Without novelty detection, corrective ML occasionally causes undesirably large climate biases. When coupled to a running year‐long coarse‐grid simulation, novelty detection deems about 21% of columns to be novelties. This identification reduces the spread in the root‐mean‐square error (RMSE) of time‐mean spatial patterns of surface temperature and precipitation rate across a random seed ensemble. In particular, the random seed with the worst RMSE is improved by up to 60% (depending on the variable) while the best seed maintains its low RMSE. By reducing the variance in quality of ML‐corrected climate models, novelty detection offers reliability without compromising prediction quality in atmospheric models.
Coherent imaging techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent imaging methods like ptychography are poised to revolutionize nanoscale materials characterization. However, these advancements are accompanied by significant increase in data and compute needs, which precludes real-time imaging, feedback and decision-making capabilities with conventional approaches. Here, we demonstrate a workflow that leverages artificial intelligence at the edge and high-performance computing to enable real-time inversion on X-ray ptychography data streamed directly from a detector at up to 2 kHz. The proposed AI-enabled workflow eliminates the oversampling constraints, allowing low-dose imaging using orders of magnitude less data than required by traditional methods.
Link-adaptation (LA) is one of the most important aspects of wireless communications where the modulation and coding scheme (MCS) used by the transmitter is adapted to the channel conditions in order to meet a certain target error-rate. In a single-user SISO (SU-SISO) system with out-of-cell interference, LA is performed by computing the post-equalization signal-to-interference-noise ratio (SINR) at the receiver. The same technique can be employed in multi-user MIMO (MU-MIMO) receivers that use linear detectors. Another important use of post-equalization SINR is for physical layer (PHY) abstraction, where several PHY blocks like the channel encoder, the detector, and the channel decoder are replaced by an abstraction model in order to speed up system-level simulations. However, for MU-MIMO systems with non-linear receivers, there is no known equivalent of post-equalization SINR which makes both LA and PHY abstraction extremely challenging. This important issue is addressed in this two-part paper. In this part, a metric called the bit-metric decoding rate (BMDR) of a detector, which is the proposed equivalent of post-equalization SINR, is presented. Since BMDR does not have a closed form expression that would enable its instantaneous calculation, a machine-learning approach to predict it is presented along with extensive simulation results.
Monolithic 3D (M3D) integration is a promising technology for achieving high performance and low power consumption. However, the limitations of current M3D fabrication flows lead to performance degradation of devices in the top tier and unreliable interconnects between tiers. Fault localization at the tier level is therefore necessary to enhance yield learning, For example, tier-level localization can enable targeted diagnosis and process optimization efforts. In this paper, we develop a graph neural network-based diagnosis framework to efficiently localize faults to a device tier. The proposed framework can be used to provide rapid feedback to the foundry and help enhance the quality of diagnosis reports generated by commercial tools. Results for four M3D benchmarks, with and without response compaction, show that the proposed solution achieves up to 32.86% improvement in diagnostic resolution with less than 1% loss of accuracy, compared to results from commercial tools. The proposed framework has also been demonstrated to be transferable to perform diagnosis on various design configurations without performance degradation.
Digital twin networks (DTNs) are real-time replicas of physical networks. They are emerging as a powerful technology for design, diagnosis, simulation, what-if-analysis, and artificial intelligence (AI)/machine learning (ML) driven real-time optimization and control of the sixth generation (6G) wireless networks. Despite the great potential of what digital twins can offer for 6G, realizing the desired capabilities of DTNs requires tackling many design aspects including data, models, and interfaces. In this article, we provide an overview of DTNs by presenting prominent use cases and their service requirements, describing a reference architecture, and discussing fundamental design aspects. We also present a real-world example to illustrate how DTNs can be built upon and operated in a real-time reference development platform - Omniverse.
Metagenome-assembled genomes (MAGs) offer valuable insights into the exploration of microbial dark matter using metagenomic sequencing data. However, there is a growing concern that contamination in MAGs may significantly impact the downstream analysis results. Existing MAG decontamination methods heavily rely on marker genes but do not fully leverage genomic sequences. To address the limitations, we have introduced a novel decontamination approach named Deepurify, which utilizes a multi-modal deep language model employing contrastive learning to learn taxonomic similarities of genomic sequences. Deepurify utilizes inferred taxonomic lineages to guide the allocation of contigs into a MAG-separated tree and employs a tree traversal strategy for maximizing the total number of medium- and high-quality MAGs. Extensive experiments were conducted on two simulated datasets, CAMI I, and human gut metagenomic sequencing data. These results demonstrate that Deepurify significantly outperforms other decontamination methods.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
782 members
Stefan Jeschke
  • Department of Research
John A. Gunnels
  • Mathematical Libraries/Quantum Computing
Karel Petrak
  • Department of Research
Information
Address
2788 San Tomas Expressway, 95051, Santa Clara, CA, United States
Website
http://nvidia.com/
Phone
+1 (408) 486-2000