Recent publications
This article establishes a data-driven modeling framework for lean hydrogen ( )-air reaction rates for the Large Eddy Simulation (LES) of turbulent reactive flows. This is particularly challenging since molecules diffuse much faster than heat, leading to large variations in burning rates, thermodiffusive instabilities at the subfilter scale, and complex turbulence-chemistry interactions. Our data-driven approach leverages a Convolutional Neural Network (CNN), trained to approximate filtered burning rates from emulated LES data. First, five different lean premixed turbulent -air flame Direct Numerical Simulations (DNSs) are computed each with a unique global equivalence ratio. Second, DNS snapshots are filtered and downsampled to emulate LES data. Third, a CNN is trained to approximate the filtered burning rates as a function of LES scalar quantities: progress variable, local equivalence ratio, and flame thickening due to filtering. Finally, the performances of the CNN model are assessed on test solutions never seen during training. The model retrieves burning rates with very high accuracy. It is also tested on two filter and downsampling parameters and two global equivalence ratios between those used during training. For these interpolation cases, the model approximates burning rates with low error even though the cases were not included in the training dataset. This a priori study shows that the proposed data-driven machine learning framework is able to address the challenge of modeling lean premixed -air burning rates. It paves the way for a new modeling paradigm for the simulation of carbon-free hydrogen combustion systems.
This animation explains the methodology used in the Research for Artificial Intelligence-Based Surrogate Endpoint (RAISE) project, presented in two manuscripts recently published in Future Oncology. This exploratory work assessed whether deep learning techniques could improve early progression-free survival (PFS) in patients with neuroendocrine tumors (NETs), using features extracted from computerized tomography scans from patients in the CLARINET phase 3 trial. As part of the RAISE project, response heterogeneity (the coexistence of responding and non-responding lesions in the same patient) was also investigated to assess whether it could be used as a biomarker to allow earlier prediction of PFS in patients with NETs. Previous definitions of response heterogeneity are not suitable for assessing slow-growing tumors, such as NETs, so the definition of response heterogeneity was adapted in this study and is described in this animation.
The ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED) was brought back to the United States for the 2024 edition, after a successful 2023 edition in Vienna, Austria. The ISLPED’24 symposium was held on 5–7 August 2024, as an in-person event in the luxurious Hyatt Regency Newport Beach Hotel in the beautiful city of Newport Beach in Southern California. Two large conference rooms and several outdoor areas on the premises of the hotel were reserved for the symposium, providing ample space for presentations and networking among attendees.
The rapid development of an emerging computing device, the graphical processing unit (GPU), has significantly enhanced our ability to conduct full kinetic particle-in-cell (PIC) simulations in space physics. In this paper, we propose an approach that leverages multiple GPUs to facilitate large-scale PIC simulations. This method can effectively reduce data transmission frequency and latency during the computing process. The data communication between GPU devices is optimized through a combination of Message Passing Interface (MPI)-NVIDIA Collective Communications Library (NCCL) running pattern. Our implementation surpasses the expected linear acceleration, achieving superior computing performance and operational efficiency. The instances of large-scale PIC simulations are presented based on physical models of magnetic reconnection, plasma turbulence, and quasi-perpendicular shock. The importance of large-scale simulations is demonstrated in terms of grid resolution, macroparticles used per cell, and the mass ratio between ions and electrons. The multi-GPU enabled fully kinetic PIC simulation demonstrates its capability to efficiently handle large-scale PIC simulations as a crucial requirement for the study of space plasma physics.
As 5G deployments continue throughout the world, concerns regarding its energy consumption have gained significant traction. This article focuses on radio access networks (RANs) which account for a major portion of the network energy use. First, we introduce the state-of-the-art 3GPP and O-RAN standardization work on enhancing RAN energy efficiency. Then we highlight three unique ways for enabling energy optimization in telecommunication networks, including full stack acceleration, network functions consolidation, and shared infrastructure between communication and artificial intelligence. These network design strategies not only allow for considerable overall reduction in the energy footprint, but also deliver several added benefits including improved throughput, reduced cost of ownership, and increased revenue opportunities for telcos.
This special issue focuses on the application of eXtended Reality (XR) technologies—comprising Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR)—and Artificial Intelligence (AI) in the fields of medicine and rehabilitation. AR provides support in minimally invasive surgery, where it visualises internal anatomical structures on the patient’s body and provides real-time feedback to improve accuracy, keep the surgeon’s attention and reduce the risk of errors. Furthermore, XR technologies can be used to develop applications for pre-operative planning or for training surgeons through serious games. AI finds applications both in medical image processing, for the recognition of anatomical structures and the reconstruction of 3D models, and in the analysis of biological data for patient monitoring and disease diagnosis. In rehabilitation, XR and AI can enable personalised therapy plans, increase patient engagement through immersive environments and provide real-time feedback to improve recovery outcomes. The papers in this special issue deal with rehabilitation through serious games, AI-enhanced XR applications for healthcare, digital twins and the analysis of bio/neuro-adaptive signals.
In modern processor design, power efficiency has become the primary constraint, prompting manufacturers to develop processors that balance energy consumption with the growing demand for speed. This shift has initiated an era of heterogeneous multi-core computing, characterized by machines utilizing various processors such as GPUs, MICs, and FPGAs. These processors significantly enhance performance due to their computational capabilities and memory bandwidth, essential for optimizing query processing performance. However, executing database queries efficiently across diverse processors presents challenges due to architectural differences, leading to varied performance outcomes for different operator implementations. This chapter explores methodologies for executing database queries on any processor with maximum efficiency without manual adjustments. We propose compiling database queries into optimized code that can adapt continuously to achieve optimal performance across a wide array of processors. Key areas of focus include the use of GPUs in database systems, addressing challenges such as workload distribution and data transfer bottlenecks, and introducing a classification scheme for strategies developed to tackle these issues. Additionally, we examine NVLink 2.0 technology’s potential to improve data transfer efficiency between GPUs and CPUs, enhancing GPU-accelerated query processing. Furthermore, we present a novel adaptive query compilation-based stream processing engine (SPE) that surpasses traditional interpretation-based SPEs by incorporating runtime optimizations and task-based parallelization. This approach allows for dynamic adjustments to data characteristics, significantly improving query execution efficiency and throughput. Through these explorations, we aim to provide insights into current systems and highlight areas for future research, ultimately contributing to the advancement of heterogeneous query processing systems.
We present a semi-analytical model that can accurately explain the working principle behind the recently reported electrically injected In0.2Ga0.8As/GaAs monolithic nano-ridge lasers and more importantly show how the model can be used to study the effect of device parameters on the spectral behavior, the slope efficiency and the threshold gain. We show that mode beating between the fundamental mode and a higher order mode is fundamental in the operation of these lasers. Analytical expressions for codirectional mode coupling are used in developing the round-trip laser model. Results from analytical expressions are verified by comparisons with simulations and the model is supported later by measurement results.
Engaging in the deliberate generation of abnormal outputs from Large Language Models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks, defining LLM red-teaming based on extensive and diverse evidence. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail. We focused on the research questions of defining LLM red teaming, uncovering the motivations and goals for performing the activity, and characterizing the strategies people use when attacking LLMs. Based on the data, LLM red teaming is defined as a limit-seeking, non-malicious, manual activity, which depends highly on a team-effort and an alchemist mindset. It is highly intrinsically motivated by curiosity, fun, and to some degrees by concerns for various harms of deploying LLMs. We identify a taxonomy of 12 strategies and 35 different techniques of attacking LLMs. These findings are presented as a comprehensive grounded theory of how and why people attack large language models: LLM red teaming.
Data processing units (DPUs) with embedded graphics processing units (GPUs) have the potential to revolutionize optical network functionalities at the edge. These advanced units can significantly enhance the performance and capabilities of optical networks by integrating powerful processing capabilities directly at the network edge, where data is generated and consumed. We explore the use cases for DPUs in optical data monitoring with local artificial intelligence (AI) processing and embedded security. This paradigm shift aims to enable more efficient data handling, reduced latency, and improved overall network performance by leveraging local AI processing capabilities embedded within DPUs. In this paper, we show how DPUs can analyze vast amounts of optical data in real-time, implementing advanced data analysis algorithms and security protocols directly on the DPUs to provide robust monitoring and protection for the optical networks. Results indicate that DPUs with embedded GPUs can significantly improve the detection and response times to network anomalies, performance issues, and security threats.
Routability has always been a significant challenge in Very Large Scale Integration (VLSI) design. To overcome the potential mismatch between the global routing results and the detailed routing requirements, track assignment is introduced to achieve an efficient routability estimation. Moreover, with the increasing scale of circuits, the intricate interconnections among the components on the chip lead to increased timing delay in signal transmission, thereby significantly impacting the performance and reliability of the circuit. Thus, to further improve the routability of the circuit, it is also critical to realize an accurate estimation of the timing delay within the track assignment stage. Existing heuristic track assignment algorithms, however, are prone to local optimality, and thus fail to provide accurate routability estimations. In this paper, we propose an enhanced scalable parallel track assignment algorithm called SPTA 2.0 for VLSI design, employing a two-stage partition strategy and considering timing delay. First, the proposed algorithm achieves efficient assignment of all wires by considering the routing information from both the global and local nets. Second, the overlap cost, the blockage cost, and the wirelength cost can be minimized to significantly improve the routability. Third, a critical wire controlling strategy is proposed to optimize signal timing delays inside nets. Finally, a two-stage partition strategy and a panel-subpanel-level parallelism are designed to further reduce the runtime, improving the scalability of the proposed methodology. Experimental results on multiple benchmarks demonstrate that the proposed method provides better routability estimations, and leads to superior track assignment solutions compared with existing algorithms.
The advent of exascale supercomputers heralds a new era of scientific discovery, yet it introduces significant architectural challenges that must be overcome for MPI applications to fully exploit its potential. Among these challenges is the adoption of heterogeneous architectures, particularly the integration of GPUs to accelerate computation. Additionally, the complexity of multithreaded programming models has also become a critical factor in achieving performance at scale. The efficient utilization of hardware acceleration for communication, provided by modern NICs, is also essential for achieving low latency and high throughput communication in such complex systems. In response to these challenges, the MPICH library, a high-performance and widely used Message Passing Interface (MPI) implementation, has undergone significant enhancements. This paper presents four major contributions that prepare MPICH for the exascale transition. First, we describe a lightweight communication stack that leverages the advanced features of modern NICs to maximize hardware acceleration. Second, our work showcases a highly scalable multithreaded communication model that addresses the complexities of concurrent environments. Third, we introduce GPU-aware communication capabilities that optimize data movement in GPU-integrated systems. Finally, we present a new datatype engine aimed at accelerating the use of MPI derived datatypes on GPUs. These improvements in the MPICH library not only address the immediate needs of exascale computing architectures but also set a foundation for exploiting future innovations in high-performance computing. By embracing these new designs and approaches, MPICH-derived libraries from HPE Cray and Intel were able to achieve real exascale performance on OLCF Frontier and ALCF Aurora respectively.
Quantum neuromorphic computing (QNC) is a sub-field of quantum machine learning (QML) that capitalizes on inherent system dynamics. As a result, QNC can run on contemporary, noisy quantum hardware and is poised to realize challenging algorithms in the near term. One key issue in QNC is the characterization of the requisite dynamics for ensuring expressive quantum neuromorphic computation. We address this issue by adapting previous proposals of quantum perceptrons (QPs), a quantum version of a simplistic model for neural computation, to the QNC setting. Our QPs compute based on the analog dynamics of interacting qubits with tunable coupling constants. We show that QPs are, with restricted resources, a quantum equivalent to the classical perceptron, a simple mathematical model for a neuron that is the building block of various machine learning architectures. Moreover, we show that QPs are theoretically capable of producing any unitary operation. Thus, QPs are computationally more expressive than their classical counterparts. As a result, QNC architectures built using our QPs are, theoretically, universal. We introduce a technique for mitigating barren plateaus in QPs called entanglement thinning. We demonstrate QPs’ effectiveness by applying them to numerous QML problems, including calculating the inner products between quantum states, energy measurements, and time reversal. Finally, we discuss potential implementations of QPs and how they can be used to build more complex QNC architectures such as quantum reservoir computers.
Quantum computers hold the promise of more efficient combinatorial optimization solvers, which could be game-changing for a broad range of applications. However, a bottleneck for materializing such advantages is that, in order to challenge classical algorithms in practice, mainstream approaches require a number of qubits prohibitively large for near-term hardware. Here we introduce a variational solver for MaxCut problems over binary variables using only n qubits, with tunable k > 1. The number of parameters and circuit depth display mild linear and sublinear scalings in m, respectively. Moreover, we analytically prove that the specific qubit-efficient encoding brings in a super-polynomial mitigation of barren plateaus as a built-in feature. Altogether, this leads to high quantum-solver performances. For instance, for m = 7000, numerical simulations produce solutions competitive in quality with state-of-the-art classical solvers. In turn, for m = 2000, experiments with n = 17 trapped-ion qubits feature MaxCut approximation ratios estimated to be beyond the hardness threshold 0.941. Our findings offer an interesting heuristics for quantum-inspired solvers as well as a promising route towards solving commercially-relevant problems on near-term quantum devices.
Institution pages aggregate content on ResearchGate related to an institution. The members listed on this page have self-identified as being affiliated with this institution. Publications listed on this page were identified by our algorithms as relating to this institution. This page was not created or approved by the institution. If you represent an institution and have questions about these pages or wish to report inaccurate content, you can contact us here.
Information
Address
Santa Clara, United States
Website