Zhuowen Tu's research while affiliated with University of California, San Diego and other places
What is this page?
This page lists the scientific contributions of an author, who either does not have a ResearchGate profile, or has not yet added these contributions to their profile.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
It was automatically created by ResearchGate to create a record of this author's body of work. We create such pages to advance our goal of creating and maintaining the most comprehensive scientific repository possible. In doing so, we process publicly available (personal) data relating to the author as a member of the scientific community.
If you're a ResearchGate member, you can follow this page to keep up with this author's work.
If you are this author, and you don't want us to display this page anymore, please let us know.
Publications (15)
We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we propose a new SSL pipeline, consisting of first un/self-supervised pre-training, followed by supervised fine-tuning, and finally semi-supervised fine-tuning....
Negative flips are errors introduced in a classification system when a legacy model is replaced with a new one. Existing methods to reduce the negative flip rate (NFR) either do so at the expense of overall accuracy using model distillation, or use ensembles, which multiply inference cost prohibitively. We present a method to train a classification...
In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image. To address these tasks, we propose X-DETR, whose architecture has three major components: an object detector, a language encoder, and vision-language alignment. The vision and la...
We propose an online tracking algorithm that performs the object detection and data association under a common framework, capable of linking objects after a long time span. This is realized by preserving a large spatio-temporal memory to store the identity embeddings of the tracked objects, and by adaptively referencing and aggregating useful infor...
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features whereby data points that are mapped to nearby representations by the source (teacher) model are also mapped to neighbors by the target (student) model. The target model aims to mimic the local structure of the source repres...
In this paper, we present Long Short-term TRansformer (LSTR), a new temporal modeling algorithm for online action detection, by employing a long- and short-term memories mechanism that is able to model prolonged sequence data. It consists of an LSTR encoder that is capable of dynamically exploiting coarse-scale historical information from an extens...
We tackle the problem of visual search under resource constraints. Existing systems use the same embedding model to compute representations (embeddings) for the query and gallery images. Such systems inherently face a hard accuracy-efficiency trade-off: the embedding model needs to be large enough to ensure high accuracy, yet small enough to enable...
Computer vision applications such as visual relationship detection and human-object interaction can be formulated as a composite (structured) set detection problem in which both the parts (subject, object, and predicate) and the sum (triplet as a whole) are to be detected in a hierarchical fashion. In this paper, we present a new approach, denoted...
We present a plug-in replacement for batch normalization (BN) called exponential moving average normalization (EMAN), which improves the performance of existing student-teacher based self- and semi-supervised learning techniques. Unlike the standard BN, where the statistics are computed within each batch, EMAN, used in the teacher, updates its stat...
We describe a system to automatically detect clinically significant findings from computerized tomography (CT) head scans, operating at performance levels exceeding that of practicing radiologists. Our system, named DeepRadiologyNet, builds on top of deep convolutional neural networks (CNNs) trained using approximately 3.5 million CT head images ga...
In this work, we present a novel 3D-Convolutional Neural Network (CNN) architecture called I2I-3D that predicts boundary location in volumetric data. Our fine-to-fine, deeply supervised framework addresses three critical issues to 3D boundary detection: (1) efficient, holistic, end-to-end volumetric label training and prediction (2) precise voxel-l...
Computational simulations provide detailed hemodynamics and physiological data that can assist in clinical decision-making. However, accurate cardiovascular simulations require complete 3D models constructed from image data. Though edge localization is a key aspect in pinpointing vessel walls in many segmentation tools, the edge detection algorithm...
Citations
... In order to exploit the rich global context, some works regard each image as a whole and adopt the fully connected graph [38], [39], the chained graph [37], and the tree-structured graph [40] to model the contexts among objects. 2) One-stage SGG: They use the fully convolutional network or Transformer to detect the objects and relations from image features directly [42], [43]. In this paper, we propose a model-agnostic debiasing method that can be used in any SGG model. ...
... EMA/Momentum has been studied deeply for smoothing the original sequence signal [33,34,35]. It becomes the widely used technique in practices for most of fields ranging from optimization [36,37,38], reinforcement learning [39,40,41], knowledge distillation [30,42], recent semi-supervised learning frameworks [22,43,44,45], and self-supervised learning methods [19,10,20,46]. Momentum is interpreted as an average of consecutive q-functions in reinforcement learning [39], or is used in SSL frameworks preventing model collapse [20,19,47]. ...
... proposed a framework, namely "Feature Lenses", to encourage image representations transformation-invariant. To balance the trade-off between performance and efficiency, Duggal et al. (2021) designed a compatibility-aware neural architecture search scheme to improve the compatibility of models with different sizes. However, since existing compatible algorithms for image retrieval have not investigated the application of hot-refresh model upgrades, the problem of model regression has been overlooked. ...
... A convolutional neural network application demonstrated performance on par with all tested board-certified experts in classifying skin cancer. 43 In radiology, it is expensive and time consuming to train radiologists, so radiographic image recognition is held to be one of the areas of AI 'witness to the greatest gains'. In one study at Cornell University, Ithaca, New York, USA, a deep convolutional neural network was capable of automatically filtering CT head images and reporting with an error rate well below that for board-certified radiologists. ...
Reference: The Era of Immersive Health Technology
... SimVascular is an actively maintained open source project, with additional enhancements and new features in preparation. For image segmentation, deep learning-based 2D segmentation has been explored with the goal to speed up the image-based anatomic modeling process and make it possible for large scale application of cardiovascular simulation on clinical studies [78,79]. These tools will be added to the GUI in future releases. ...
... A highly imbalanced data poses great difficulty in training DL model and makes model accuracy misleading, for example, in a patient data, where the disease is relatively rare and occurs only in 10% of patients screened. e overall designed model accuracy would be high as most of the patients do not have the disease and will reach local minima [88,89]. e problem of class imbalance can be solved by (a) oversampling the data; the amount of oversampling depends on the extent of imbalance in the dataset. ...