Wei Han

Wei Han
  • University of Illinois Urbana-Champaign

About

135
Publications
27,667
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
15,964
Citations
Current institution

Publications

Publications (135)
Article
Full-text available
Background Atherogenic index of plasma (AIP) has been recommended as a marker of plasma atherogenicity. The impact of AIP on plaque characteristics is not fully understood. Purpose The study investigates the relationship between AIP and coronary plaque features in patients with acute coronary syndrome (ACS). Methods From January 2016 to June 2017...
Preprint
We introduce AudioPaLM, a large language model for speech understanding and generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil et al., 2023] and AudioLM [Borsos et al., 2022], into a unified multimodal architecture that can process and generate text and speech with applications including speech recognition and spe...
Preprint
Full-text available
Large Language Models (LLMs) have been applied in the speech domain, often incurring a performance drop due to misaligned between speech and language representations. To bridge this gap, we propose a joint speech and language model (SLM) using a Speech2Text adapter, which maps speech into text token embedding space without speech information loss....
Preprint
Full-text available
Speech representation learning approaches for non-semantic tasks such as language recognition have either explored supervised embedding extraction methods using a classifier model or self-supervised representation learning approaches using raw data. In this paper, we propose a novel framework of combining self-supervised representation learning wit...
Preprint
Full-text available
This paper introduces a new speech dataset called ``LibriTTS-R'' designed for text-to-speech (TTS) use. It is derived by applying speech restoration to the LibriTTS corpus, which consists of 585 hours of speech data at 24 kHz sampling rate from 2,456 speakers and the corresponding texts. The constituent samples of LibriTTS-R are identical to those...
Preprint
Full-text available
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages. This is achieved by pre-training the encoder of the model on a large unlabeled multilingual dataset of 12 million (M) hours spanning over 300 languages, and fine-tuning on a smaller labeled dataset. We use mult...
Preprint
Full-text available
Speech restoration (SR) is a task of converting degraded speech signals into high-quality ones. In this study, we propose a robust SR model called Miipher, and apply Miipher to a new SR application: increasing the amount of high-quality training data for speech generation by converting speech samples collected from the Web to studio-quality. To mak...
Article
Full-text available
Abstract Background Heterozygous familial hypercholesterolemia (HeFH) is largely underdiagnosed and undertreated in China where few patients achieved recommended target levels of low density lipoprotein cholesterol (LDL-C). We conducted the first randomized, placebo-controlled clinical trial in Chinese patients with HeFH to assess the efficacy and...
Preprint
Full-text available
We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts. Two types of diffusion models, a generator model, which generates an intermediate representation conditioned on text, and a cascader model, which generates high-fidelity audio conditioned on the intermediate repr...
Preprint
Full-text available
Foundation models (FMs), that are trained on broad data at scale and are adaptable to a wide range of downstream tasks, have brought large interest in the research community. Benefiting from the diverse data sources such as different modalities, languages and application domains, foundation models have demonstrated strong generalization and knowled...
Preprint
Full-text available
Most research on task oriented dialog modeling is based on written text input. However, users interact with practical dialog systems often using speech as input. Typically, systems convert speech into text using an Automatic Speech Recognition (ASR) system, introducing errors. Furthermore, these systems do not address the differences in written and...
Preprint
Full-text available
We propose a novel method to accelerate training and inference process of recurrent neural network transducer (RNN-T) based on the guidance from a co-trained connectionist temporal classification (CTC) model. We made a key assumption that if an encoder embedding frame is classified as a blank frame by the CTC model, it is likely that this frame wil...
Article
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large t...
Preprint
We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. Parti treats text-to-image generation as a sequence-to-sequence modeling problem, akin to machine translation, with sequences of image tokens...
Preprint
Full-text available
Building inclusive speech recognition systems is a crucial step towards developing technologies that speakers of all language varieties can use. Therefore, ASR systems must work for everybody independently of the way they speak. To accomplish this goal, there should be available data sets representing language varieties, and also an understanding o...
Preprint
Self-supervised learning of speech representations has achieved impressive results in improving automatic speech recognition (ASR). In this paper, we show that data selection is important for self-supervised learning. We propose a simple and effective unsupervised data selection method which selects acoustically similar speech to a target domain. I...
Article
To explore whether transcystic-laparoscopic common bile duct exploration with microincision of the cystic duct and its confluence partis effective and safe as transductal-laparoscopic common bile duct exploration for the failed transcystic-laparoscopic common bile duct exploration in patients with choledocholithiasis. In this retrospective cohort s...
Preprint
In learning action recognition, models are typically pre-trained on object recognition with images, such as ImageNet, and later fine-tuned on target action recognition with videos. This approach has achieved good empirical performance especially with recent transformer-based video architectures. While recently many works aim to design more advanced...
Article
Full-text available
Background: Transcatheter aortic valve implantation (TAVI) has achieved satisfactory outcomes in the selected patients with bicuspid aortic valve (BAV), predominately type 1 BAV (~90%). However, there are few reports about the safety and efficacy of TAVI in type 0 BAV. Therefore, in the current study, we aimed to compare procedural and 30-day outco...
Article
Full-text available
Background The prevalence of coronary artery disease (CAD) continues to increase among young Chinese adults. Current smoking has been recognized as a major risk factor for premature CAD, and hyperhomocysteinaemia (HHcy) has also been suggested to be associated with CAD progression. However, the combined effect of current smoking and HHcy on the sev...
Preprint
Full-text available
Many speech applications require understanding aspects beyond the words being spoken, such as recognizing emotion, detecting whether the speaker is wearing a mask, or distinguishing real from synthetic speech. In this work, we introduce a new state-of-the-art paralinguistic representation derived from large-scale, fully self-supervised training of...
Preprint
We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large t...
Preprint
Full-text available
Motivated by the success of masked language modeling~(MLM) in pre-training natural language processing models, we propose w2v-BERT that explores MLM for self-supervised speech representation learning. w2v-BERT is a framework that combines contrastive learning and MLM, where the former trains the model to discretize input continuous speech signals i...
Preprint
Streaming end-to-end automatic speech recognition (ASR) systems are widely used in everyday applications that require transcribing speech to text in real-time. Their minimal latency makes them suitable for such tasks. Unlike their non-streaming counterparts, streaming models are constrained to be causal with no future context and suffer from higher...
Preprint
Although end-to-end automatic speech recognition (e2e ASR) models are widely deployed in many applications, there have been very few studies to understand models' robustness against adversarial perturbations. In this paper, we explore whether a targeted universal perturbation vector exists for e2e ASR models. Our goal is to find perturbations that...
Preprint
Full-text available
Background: The prevalence of coronary artery disease (CAD) continues to increase among young Chinese adults. Current smoking has been recognized as a major risk factor for premature CAD, and hyperhomocysteinaemia (HHcy) has also been suggested to be associated with CAD progression. However, the combined effect of current smoking and HHcy on the se...
Article
Full-text available
Background The prevalence of acute coronary syndrome (ACS) continues to increase among young Chinese adults. Homocysteine (HCY) has been suggested as a promoter of atherosclerosis leading to coronary artery disease (CAD). Yet, it remains uncertain whether HCY is associated with the ACS and the severity of coronary artery stenosis in young adults....
Article
Full-text available
Increasing evidence suggests that long non-coding RNAs (lncRNAs) are involved in neuroblastoma (NB) pathogenesis. The aim of this study was to elucidate the roles and underlying mechanism of non-coding RNA activated by DNA damage (NORAD) in childhood NB. Both public data and clinical specimens were used to determine NORAD expression. Colony formati...
Chapter
Autonomous vehicles operate in a dynamic environment, where the speed with which a vehicle can perceive and react impacts the safety and efficacy of the system. LiDAR provides a prominent sensory modality that informs many existing perceptual systems including object detection, segmentation, motion estimation, and action recognition. The latency fo...
Preprint
Full-text available
Background: The prevalence of acute coronary syndrome (ACS) continues to increase among young Chinese adults. Homocysteine (HCY) has been suggested as a crucial promoter of atherosclerosis leading to coronary artery disease (CAD). Yet, it remains uncertain whether HCY is associated with the ACS and the severity of coronary artery stenosis in very y...
Preprint
Full-text available
End-to-end (E2E) models have shown to outperform state-of-the-art conventional models for streaming speech recognition [1] across many dimensions, including quality (as measured by word error rate (WER)) and endpointer latency [2]. However, the model still tends to delay the predictions towards the end and thus has much higher partial latency compa...
Preprint
Streaming end-to-end automatic speech recognition (ASR) models are widely used on smart speakers and on-device applications. Since these models are expected to transcribe speech with minimal latency, they are constrained to be causal with no future context, compared to their non-streaming counterparts. Consequently, streaming models usually perform...
Preprint
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible. However, emitting fast without degrading quality, as measured by word error rate (WER), is highly challenging. Existing approaches including Early and Late Penalties and Constrained Alignments penalize emission delay by manipulati...
Preprint
We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-tr...
Preprint
Streaming automatic speech recognition (ASR) aims to emit each hypothesized word as quickly and accurately as possible, while full-context ASR waits for the completion of a full speech utterance before emitting completed hypotheses. In this work, we propose a unified framework, Universal ASR, to train a single end-to-end ASR model with shared weigh...
Article
Full-text available
Background: Opsoclonus-myoclonus syndrome (OMS) is a rare neurological disease. Some children with OMS also have neuroblastoma (NB). We and others have previously documented that serum IgG from children with OMS and NB induces neuronal cytolysis and activates several signaling pathways. However, the mechanisms underlying OMS remain unclear. Here,...
Preprint
Recently, a semi-supervised learning method known as "noisy student training" has been shown to improve image classification performance of deep networks significantly. Noisy student training is an iterative self-training method that leverages augmentation to improve network performance. In this work, we adapt and improve noisy student training for...
Preprint
Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing content-based global interactions, while CNNs exploit local features effectively. In this work, we achieve the best of bot...
Preprint
In recent years, all-neural end-to-end approaches have obtained state-of-the-art results on several challenging automatic speech recognition (ASR) tasks. However, most existing works focus on building ASR models where train and test data are drawn from the same domain. This results in poor generalization characteristics on mismatched-domains: e.g.,...
Preprint
Convolutional neural networks (CNN) have shown promising results for end-to-end speech recognition, albeit still behind other state-of-the-art methods in performance. In this paper, we study how to bridge this gap and go beyond with a novel CNN-RNN-transducer architecture, which we call ContextNet. ContextNet features a fully convolutional encoder...
Preprint
Autonomous vehicles operate in a dynamic environment, where the speed with which a vehicle can perceive and react impacts the safety and efficacy of the system. LiDAR provides a prominent sensory modality that informs many existing perceptual systems including object detection, segmentation, motion estimation, and action recognition. The latency fo...
Article
Full-text available
Background: Neuroblastoma (NB) tumor rupture is a rare oncology emergency with a poor prognosis. We aimed to evaluate patient clinical characteristics and risk factors for ruptured NB. Methods: A retrospective study of 47 patients with confirmed NB rupture between January 2009 and January 2019 at Beijing Children's Hospital was conducted. To ide...
Preprint
Full-text available
Backgroud: Neuroblastoma (NB) tumor rupture is a rare oncology emergency with a poor prognosis. We aimed to evaluate patient clinical characteristics and risk factors for ruptured NB. Methods: A retrospective study of 47 patients with confirmed NB rupture between January 2009 and January 2019 at Beijing Children's Hospital was conducted. To identif...
Preprint
Full-text available
Backgroud: Neuroblastoma (NB) tumor rupture is a rare oncology emergency with a poor prognosis. We aimed to evaluate patient clinical characteristics and risk factors for ruptured NB. Methods: A retrospective study of 47 patients with confirmed NB rupture between January 2009 and January 2019 at Beijing Children's Hospital was conducted. To identif...
Preprint
Full-text available
Backgroud: Neuroblastoma (NB) tumor rupture is a rare oncology emergency with poor prognosis. We aimed to evaluate clinical characteristics and risk factors for ruptured NB. Methods: A retrospective study was conducted on 47 confirmed ruptured NB patients in Beijing Children's Hospital between January 2009 and January 2019. To identify tumor ruptur...
Article
Neuroblastoma (NB) is the most common solid extracranial malignancy in children with a considerable chance of metastatic progression. Prevalent evidence supports the anti-tumor role of γδT cells and these cells have been testing in clinical trials for constraining tumor growth. A small subpopulation of γδT cells releasing IL-17, however, were demon...
Preprint
Full-text available
Background Opsoclonus-myoclonus syndrome (OMS) is a rare neurological disease. Some children with OMS also have neuroblastoma (NB). We and others have previously documented that serum IgG from children with OMS and NB induces neuronal cytolysis via several signaling pathways. However, mechanisms underlying OMS remain unclear. Here we investigated w...
Preprint
Full-text available
Background: Opsoclonus-myoclonus syndrome (OMS) is a rare neurological disease. Some children with OMS also have neuroblastoma (NB). We and others have previously documented that serum IgG from children with OMS and NB induces neuronal cytolysis and activates several signaling pathways. However, the mechanisms underlying OMS remain unclear. Here we...
Preprint
The research community has increasing interest in autonomous driving research, despite the resource intensity of obtaining representative real world data. Existing self-driving datasets are limited in the scale and variation of the environments they capture, even though generalization within and between operating regions is crucial to the overall v...
Preprint
End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown superior performance compared to conventional systems. However, previous studies have focused primarily on short utterances that typically last for just a few seconds or, at most, a few tens of...
Article
Full-text available
Background Pancreatoblastoma is a very rare malignant pancreatic tumor in children. Pancreatoblastoma is the most common pancreatic tumor in children less than 10 years of age, accounting for 25% of the pancreatic neoplasm. There were only a few published literatures about the standardized diagnostic and management protocol for PB in the last decad...
Article
Full-text available
Introduction The best approach for choledocholithiasis remains a matter of debate. Choledocholithiasis is usually treated with endoscopic sphincterotomy (EST), laparoscopic common bile duct exploration (LCBDE) or laparoscopic transcystic common bile duct exploration (LTCBDE). Data pertaining to the clinical outcomes of these approaches in the manag...
Preprint
Full-text available
Background: Neuroblastoma (NB) tumor rupture was a rare oncology emergency with poor prognosis. We aimed to evaluate clinical characteristics and risk factors for ruptured NB. Methods: A retrospective study was conducted on 47 confirmed ruptured NB patients in Beijing Children's Hospital between January 2009 and January 2019. To identify tumor rupt...
Article
Background: To investigate the safety, feasibility, and complications of pancreatectomies for pediatric pancreatic tumors. Methods: The medical records of pancreatectomy patients from January 2007 to January 2018 were retrospectively analyzed for perioperative factors and complications. Patients were divided into pancreatic head (n = 43), body (...
Preprint
LiDAR sensor systems provide high resolution spatial information about the environment for self-driving cars. Therefore, detecting objects from point clouds derived from LiDAR represents a critical problem. Previous work on object detection from LiDAR has emphasized re-purposing convolutional approaches from traditional camera imagery. In this work...
Article
Full-text available
Background: To investigate the safety, feasibility, and complications of using duodenum-preserving pancreas head resection (DPPHR) to treat pediatric benign and low-grade malignant pancreatic head tumors. Methods: Patients with pancreatic head tumors that underwent resection were retrospectively analyzed for perioperative factors and postoperati...
Article
Full-text available
Neuroblastoma (NB) is a sympathetic nervous system cancer for children, occupying approximately 15% of pediatric oncology deaths. BARD1, a tumor suppressor, is essential for genome stability by interaction with BRCA1. Here, we performed a systematic investigation for the association between SNPs in BARD1 and the risk of NB in Chinese population. Af...
Article
Neuroblastoma is one of the children's malignant tumors with poor prognosis, as well as high recurrence and metastasis rates after surgical removal and chemotherapy. γδ T-cell based immunotherapy receives increasing attention thanks to their strong cytolytic activity to tumor cells. Our previous data revealed a significant increase in circulating γ...
Article
Full-text available
Hepatoblastoma (HB), a leading primary hepatic malignancy in children, originates from primitive hepatic stem cells. This study aimed to uncover the genetic variants that are responsible for HB oncogenesis. One family, which includes the healthy parents, and two brothers affected by HB, was recruited. Whole-genome sequencing (WGS) of germline DNA f...
Article
Full-text available
Background: Opsoclonus-myoclonus syndrome (OMS) is a rare neurological disorder, usually accompanied by neuroblastoma (NB). There is no targeted treatment and animal model of OMS. We aimed to investigate whether insulin-like growth factor 1 (IGF-1)/phosphoinositide 3-kinase (PI3K) signaling alleviates neuronal cytolysis in pediatric OMS. Methods:...
Article
Full-text available
The aim of this study was to discriminate the children malignant peripheral neuroblastic tumors (PNTs) from those with benign histotype ganglioneuroma (GN) based on clinical and biological characteristics in all PNTs. Four hundred and seventy-six patients were included in this study, containing 345 patients for model development and 131 patients fo...
Preprint
Advances in image super-resolution (SR) have recently benefited significantly from rapid developments in deep neural networks. Inspired by these recent discoveries, we note that many state-of-the-art deep SR architectures can be reformulated as a single-state recurrent neural network (RNN) with finite unfoldings. In this paper, we explore new struc...
Article
Full-text available
We present a novel and compact architecture for deep Convolutional Neural Networks (CNNs) in this paper, termed $3$D-FilterMap Convolutional Neural Networks ($3$D-FM-CNNs). The convolution layer of $3$D-FM-CNN learns a compact representation of the filters, named $3$D-FilterMap, instead of a set of independent filters in the conventional convolutio...
Article
Full-text available
Notoriously, learning with recurrent neural networks (RNNs) on long sequences is a difficult task. There are three major challenges: 1) extracting complex dependencies, 2) vanishing and exploding gradients, and 3) efficient parallelization. In this paper, we introduce a simple yet effective RNN connection structure, the DILATEDRNN, which simultaneo...
Article
Full-text available
Background Previous studies have shown that γδ TFH cells are capable of modulating antibody production in immunized and infected mouse model. In recent studies, human γδ TFH cells are shown to contribute to the activation of humoral immunity and promote the maturation of B cells. However, little information is available on their involvement in neur...
Article
Objectives: Encouraging progress has been made in application of splenectomy in the treatment of relapsed hemophagocytic lymphohistiocytosis (HLH) of unknown cause. The aim was to determine the roles of lymphocyte subpopulations and inflammatory cytokines in splenectomy. Methods: We retrospectively analyzed changes in lymphocyte subpopulations and...
Article
Spontaneous splenic rupture, also referred to as atraumatic splenic rupture, is a rare but life-threatening emergency condition. Without timely diagnosis and treatment, the mortality rate of splenic rupture approaches 100%. The etiology of atraumatic splenic rupture varies; it is reportedly associated with neoplasms or splenic infection, but is rar...
Article
Background: Choledocholithiasis represents a greater proportion of gallstone in the elderly. Elderly patients have more comorbidity, which could increase the operative risk and postoperative complications. However, no study has focused on the effect and safety of laparoscopic transcystic common bile duct exploration (LTCBDE) in elderly patients. T...
Article
Single image super-resolution is an ill-posed problem which tries to recover a high-resolution image from its lowresolution observation. To regularize the solution of the problem, previous methods have focused on designing good priors for natural images such as sparse representation, or directly learning the priors from a large data set with models...

Network

Cited By