Jie Lin

Jie Lin
  • Researcher at Institute for Infocomm Research

About

94
Publications
16,880
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,622
Citations
Current institution
Institute for Infocomm Research
Current position
  • Researcher

Publications

Publications (94)
Article
Non-maximum suppression (NMS) is an essential post-processing step for object detection. The de-facto standard for NMS, namely GreedyNMS, is not parallelizable and could thus be the performance bottleneck in object detection pipelines. MaxpoolNMS is introduced as a fast and parallelizable alternative to GreedyNMS. However, MaxpoolNMS is only capabl...
Preprint
Vision transformers have emerged as a promising alternative to convolutional neural networks for various image analysis tasks, offering comparable or superior performance. However, one significant drawback of ViTs is their resource-intensive nature, leading to increased memory footprint, computation complexity, and power consumption. To democratize...
Article
Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recent...
Article
This paper presents a novel method for supervised multi-view representation learning, which projects multiple views into a latent common space while preserving the discrimination and intrinsic structure of each view. Specifically, an apriori discriminant similarity graph is first constructed based on labels and pairwise relationships of multi-vie...
Article
Knowledge distillation (KD) is a learning paradigm for boosting resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the local structure preserving (LSP) loss, which matches local structural relationships defined over edges across the student and teacher’s...
Chapter
Allocating different bit widths to different channels and quantizing them independently bring higher quantization precision and accuracy. Most of prior works use equal bit width to quantize all layers or channels, which is sub-optimal. On the other hand, it is very challenging to explore the hyperparameter space of channel bit widths, as the search...
Article
In this paper, we study how to make unsupervised cross-modal hashing (CMH) benefit from contrastive learning (CL) by overcoming two challenges. To be exact, i) to address the performance degradation issue caused by binary optimization for hashing, we propose a novel momentum optimizer that performs hashing operation learnable in CL, thus making on-...
Preprint
Full-text available
As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the...
Preprint
Knowledge distillation is a promising learning paradigm for boosting the performance and reliability of resource-efficient graph neural networks (GNNs) using more expressive yet cumbersome teacher models. Past work on distillation for GNNs proposed the Local Structure Preserving loss (LSP), which matches local structural relationships across the st...
Article
In this article, we study two challenging problems in semisupervised cross-view learning. On the one hand, most existing methods assume that the samples in all views have a pairwise relationship, that is, it is necessary to capture or establish the correspondence of different views at the sample level. Such an assumption is easily isolated even in...
Preprint
In this work, we address the challenging task of few-shot segmentation. Previous few-shot segmentation methods mainly employ the information of support images as guidance for query image segmentation. Although some works propose to build cross-reference between support and query images, their extraction of query information still depends on the sup...
Preprint
Non-maximum Suppression (NMS) is an essential postprocessing step in modern convolutional neural networks for object detection. Unlike convolutions which are inherently parallel, the de-facto standard for NMS, namely GreedyNMS, cannot be easily parallelized and thus could be the performance bottleneck in convolutional object detection pipelines. Ma...
Article
As Deep Neural Networks (DNNs) usually are overparameterized and have millions of weight parameters, it is challenging to deploy these large DNN models on resource-constrained hardware platforms, e.g., smartphones. Numerous network compression methods such as pruning and quantization are proposed to reduce the model size significantly, of which the...
Article
With the booming development of the online fashion industry, effective personalized recommender systems have become indispensable for the convenience they brought to the customers and the profits to the e-commercial platforms. Estimating the user’s preference towards the outfit is at the core of a personalized recommendation system. Existing works...
Preprint
Homomorphic Encryption (HE), allowing computations on encrypted data (ciphertext) without decrypting it first, enables secure but prohibitively slow Neural Network (HENN) inference for privacy-preserving applications in clouds. To reduce HENN inference latency, one approach is to pack multiple messages into a single ciphertext in order to reduce th...
Preprint
Full-text available
Despite the vast literature on Human Activity Recognition (HAR) with wearable inertial sensor data, it is perhaps surprising that there are few studies investigating semisupervised learning for HAR, particularly in a challenging scenario with class imbalance problem. In this work, we present a new benchmark, called A*HAR, towards semisupervised lea...
Article
Cross-modal retrieval aims at retrieving relevant points across different modalities, such as retrieving images via texts. One key challenge of cross-modal retrieval is narrowing the heterogeneous gap across diverse modalities. To overcome this challenge, we propose a novel method termed as Cross-modal discriminant Adversarial Network (CAN). Taking...
Article
Thanks to the low storage cost and high query speed, cross-view hashing (CVH) has been successfully used for similarity search in multimedia retrieval. However, most existing CVH methods use all views to learn a common Hamming space, thus making it difficult to handle the data with increasing views or a large number of views. To overcome these diff...
Article
We present a multi-GPU design, implementation and performance evaluation of the Halevi-Polyakov-Shoup (HPS) variant of the Fan-Vercauteren (FV) levelled Fully Homomorphic Encryption (FHE) scheme. Our design follows a data parallelism approach and uses partitioning methods to distribute the workload in FV primitives evenly across available GPUs. The...
Article
Deep Learning as a Service (DLaaS) stands as a promising solution for cloud-based inference applications. In this setting, the cloud has a pre-learned model whereas the user has samples on which she wants to run the model. The biggest concern with DLaaS is the user privacy if the input samples are sensitive data. We provide here an efficient privac...
Preprint
Full-text available
Knowledge Distillation (KD) is a common method for transferring the ``knowledge'' learned by one machine learning model (the \textit{teacher}) into another model (the \textit{student}), where typically, the teacher has a greater capacity (e.g., more parameters or higher bit-widths). To our knowledge, existing methods overlook the fact that although...
Article
Cross-modal retrieval aims to retrieve the relevant samples across different modalities, of which the key problem is how to model the correlations among different modalities while narrowing the large heterogeneous gap. In this paper, we propose a Semi-supervised Multimodal Learning Network method (SMLN) which correlates different modalities by capt...
Preprint
Full-text available
With the increasing global popularity of self-driving cars, there is an immediate need for challenging real-world datasets for benchmarking and training various computer vision tasks such as 3D object detection. Existing datasets either represent simple scenarios or provide only day-time data. In this paper, we introduce a new challenging A*3D data...
Chapter
In the past 5 years, deep learning has achieved remarkable breakthroughs, mainly attributed to the success of convolutional neural networks (CNN) on vision applications like ImageNet classification. In this chapter, we are interested in deep learning-based descriptors for object instance search in images. Specifically, we propose to tackle some pra...
Chapter
There has been a rapid development of custom hardware for accelerating the inference speed of deep neural networks (DNNs), by explicitly incorporating hardware metrics (e.g., area and energy) as additional constraints, in addition to application accuracy. Recent efforts mainly focused on linear functions (matrix multiplication) in convolutional (Co...
Preprint
Full-text available
Convolutional neural networks (CNNs) have enabled significant performance leaps in medical image classification tasks. However, translating neural network models for clinical applications remains challenging due to data privacy issues. Fully Homomorphic Encryption (FHE) has the potential to address this challenge as it enables the use of CNNs on en...
Preprint
Full-text available
This paper addresses a challenging problem - how to reduce energy consumption without incurring performance drop when deploying deep neural networks (DNNs) at the inference stage. In order to alleviate the computation and storage burdens, we propose a novel dataflow-based joint quantization approach with the hypothesis that a fewer number of quanti...
Preprint
Embedded deep learning platforms have witnessed two simultaneous improvements. First, the accuracy of convolutional neural networks (CNNs) has been significantly improved through the use of automated neural-architecture search (NAS) algorithms to determine CNN structure. Second, there has been increasing interest in developing application-specific...
Article
The MPEG compact descriptors for visual search (CDVS) is a standard towards image matching and retrieval. To achieve high retrieval accuracy over a large scale image/video dataset, recent research efforts have demonstrated that employing extremely high-dimensional descriptors such as the Fisher Vector (FV) and the Vector of Locally Aggregated Descr...
Preprint
Full-text available
Video understanding has attracted much research attention especially since the recent availability of large-scale video benchmarks. In this paper, we address the problem of multi-label video classification. We first observe that there exists a significant knowledge gap between how machines and humans learn. That is, while current machine learning a...
Conference Paper
Full-text available
Object detection in images is a crucial task in computer vision, with important applications ranging from security surveillance to autonomous vehicles. Existing state-of-the-art algorithms, including deep neural networks, only focus on utilizing features within an image itself, largely neglecting the vast amount of background knowledge about the re...
Article
Full-text available
In this work, we focus on the problem of image instance retrieval with deep descriptors extracted from pruned Convolutional Neural Networks (CNN). The objective is to heavily prune convolutional edges while maintaining retrieval performance. To this end, we introduce both data-independent and data-dependent heuristics to prune convolutional edges,...
Poster
Full-text available
The YouTube-8M video classification challenge requires teams to classify 0.7 million videos into one or more of 4,716 classes. In this Kaggle competition, we placed in the top 3% out of 650 participants using released video and audio features. Beyond that, we extend the original competition by including text information in the classification, makin...
Article
With emerging demand for large scale video analysis, MPEG initiated the Compact Descriptor for Video Analysis (CDVA) standardization in 2014. Unlike handcrafted descriptors adopted by the ongoing CDVA standard, in this work, we study the problem of deep learned global descriptors for video matching, localization and retrieval. First, inspired by a...
Conference Paper
This work focuses on representing very high-dimensional global image descriptors using very compact 64-1024 bit binary hashes for instance retrieval. We propose DeepHash: a hashing scheme based on deep networks. Key to making DeepHash work at extremely low bitrates are three important considerations -- regularization, depth and fine-tuning -- each...
Conference Paper
The goal of this work is the computation of very compact binary hashes for image instance retrieval. Our approach has two novel contributions. The first one is Nested Invariance Pooling (NIP), a method inspired from i-theory, a mathematical theory for computing group invariant transformations with feed-forward neural networks. NIP is able to produc...
Article
We present a deep learning framework for computer-aided lung cancer diagnosis. Our multi-stage framework detects nodules in 3D lung CAT scans, determines if each nodule is malignant, and finally assigns a cancer probability based on these results. We discuss the challenges and advantages of our framework. In the Kaggle Data Science Bowl 2017, our f...
Article
This paper provides an overview of the on-going compact descriptors for video analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG). MPEG-CDVA targets at defining a standardized bitstream syntax to enable interoperability in the context of video analysis applications. During the developments of MPEGCDVA, a series of techniq...
Preprint
This paper provides an overview of the on-going compact descriptors for video analysis standard (CDVA) from the ISO/IEC moving pictures experts group (MPEG). MPEG-CDVA targets at defining a standardized bitstream syntax to enable interoperability in the context of video analysis applications. During the developments of MPEGCDVA, a series of techniq...
Article
Image instance retrieval is the problem of retrieving images from a database which contain the same object. Convolutional Neural Network (CNN) based descriptors are becoming the dominant approach for generating global image descriptors for the instance retrieval problem. One major drawback of CNN-based global descriptors is that uncompressed deep n...
Article
Full-text available
3D feature descriptors are heavily employed in various 3D perception applications to find keypoint correspondences between two point clouds. The availability of mobile devices equipped with depth sensors compel the developed applications to be both memory and computationally efficient. Towards this, in this paper, we present 3DHoPD, a new low dimen...
Article
Full-text available
Image instance retrieval is the problem of retrieving images from a database which contain the same object. Convolutional Neural Network (CNN) based descriptors are becoming the dominant approach for generating {\it global image descriptors} for the instance retrieval problem. One major drawback of CNN-based {\it global descriptors} is that uncompr...
Article
In applications that require an input point cloud to be matched with a set of database point clouds present on a remote server, it is preferable to compress and transfer 3D feature descriptors online, rather than compressing and transferring the whole input point cloud. This is because the former would require much lesser bandwidth and does not req...
Article
Driver's fatigue is one of the major causes of traffic accidents, particularly for drivers of large vehicles (such as buses and heavy trucks) due to prolonged driving periods and boredom in working conditions. In this paper, we propose a vision-based fatigue detection system for bus driver monitoring, which is easy and flexible for deployment in bu...
Article
Full-text available
The goal of this work is the computation of very compact binary hashes for image instance retrieval. Our approach has two novel contributions. The first one is Nested Invariance Pooling (NIP), a method inspired from i-theory, a mathematical theory for computing group invariant transformations with feed-forward neural networks. NIP is able to produc...
Article
Full-text available
With the increasing availability of wearable devices, research on egocentric activity recognition has received much attention recently. In this paper, we build a Multimodal Egocentric Activity dataset which includes egocentric videos and sensor data of 20 fine-grained and diverse activity categories. We present a novel strategy to extract temporal...
Article
Full-text available
Most image instance retrieval pipelines are based on comparison of vectors known as global image descriptors between a query image and the database images. Due to their success in large scale image classification, representations extracted from Convolutional Neural Networks (CNN) are quickly gaining ground on Fisher Vectors (FVs) as state-of-the-ar...
Conference Paper
Towards low latency query transmission via wireless link, methods have been proposed to extract compact visual descriptors on mobile device and then send these descriptors to the server at low bit rates in recent mobile image retrieval systems. The drawback is that such on-device feature extraction demands heavy computational cost and large memory...
Article
Compact Descriptors for Visual Search (CDVS) is a recently completed standard from the ISO/IEC Moving Pictures Experts Group (MPEG). The primary goal of this standard is to provide a standardized bitstream syntax to enable interoperability in the context of image retrieval applications. Over the course of the standardization process, remarkable imp...
Article
Full-text available
A typical image retrieval pipeline starts with the comparison of global descriptors from a large database to find a short list of candidate matches. A good image descriptor is key to the retrieval pipeline and should reconcile two contradictory requirements: providing recall rates as high as possible and being as compact as possible for fast matchi...
Article
With deep learning becoming the dominant approach in computer vision, the use of representations extracted from Convolutional Neural Nets (CNNs) is quickly gaining ground on Fisher Vectors (FVs) as favoured state-of-the-art global image descriptors for image instance retrieval. While the good performance of CNNs for image classification are unambig...
Article
The first step in an image retrieval pipeline consists of comparing global descriptors from a large database to find a short list of candidate matching images. The more compact the global descriptor, the faster the descriptors can be compared for matching. State-of-the-art global descriptors based on Fisher Vectors are represented with tens of thou...
Article
Towards low bit rate mobile visual search, recent works have proposed to aggregate the local features and compress the aggregated descriptor (such as Fisher vector, the vector of locally aggregated descriptors) for low latency query delivery as well as moderate search complexity. Even though Hamming distance can be computed very fast, the computati...
Article
Full-text available
Compact keyframe-based video summaries are a popular way of generating viewership on video sharing platforms. Yet, creating relevant and compelling summaries for arbitrarily long videos with a small number of keyframes is a challenging task. We propose a comprehensive keyframe-based summarization framework combining deep convolutional neural networ...
Article
Full-text available
This work focuses on representing very high-dimensional global image descriptors using very compact 64-1024 bit binary hashes for instance retrieval. We propose DeepHash: a hashing scheme based on deep networks. Key to making DeepHash work at extremely low bitrates are three important considerations -- regularization, depth and fine-tuning -- each...
Article
To ensure application interoperability in visual object search technologies, the MPEG Working Group has made great efforts in standardizing visual search technologies. Moreover, extraction and transmission of compact descriptors are valuable for next-generation, mobile, visual search applications. This article reviews the significant progress of MP...
Article
Extraction and transmission of compact descriptors are of great importance for next-generation mobile visual search applications. Existing visual descriptor techniques mainly compress visual features into compact codes of fixed bit rate, which is not adaptive to the bandwidth fluctuation in wireless environment. In this letter, we propose a Rate-ad...
Conference Paper
In this paper, we address the problem of pair-wise image matching which determines whether two images depict the same objects or scenes. SIFT-like local descriptor-based matching is the most widely adopted method for this purpose and has achieved the state-of-the-art performance. However, local descriptor-based methods usually fail when an image pa...
Conference Paper
In this paper, we present the state-of-the-art compact descriptors for mobile visual search. In particular, we introduce our MPEG contributions in global descriptor aggregation and local descriptor compression, which have been adopted by the ongoing MPEG standardization of compact descriptor for visual search (CDVS). Standardization progress will b...
Conference Paper
Fisher vectors (FV) have shown great advantages in large scale visual search. However, traditional FV suffers from noisy local descriptors, which may deteriorate the FV discriminative power. In this paper, we propose a robust Fisher vectors (RFV). To fulfill fast search and light storage over a large scale image dataset, we employ a simple binariza...
Conference Paper
There are a number of component technologies that are useful for visual search, including format of visual descriptors, descriptor extraction process, as well as indexing, and matching algorithms. As a minimum, the format of descriptors as well as parts of their extraction process should be defined to ensure interoperability. In this paper, we stud...
Conference Paper
To improve query throughput, distributed image retrieval has been widely used to address the large scale visual search. In textual retrieval, the state-of-the-art approaches attempt to partition a textual database into multiple collections offline and allocate each collection to a server node. For each incoming query, just a few relevant collection...
Conference Paper
User-generated tags associated with images from social media (e.g., Flickr) provide valuable textual resources for image classification. However, the noisy and huge tag vocabulary heavily degrades the effectiveness and efficiency of state-of-the-art image classification methods that exploited auxiliary web data. To alleviate the problem, we introdu...
Conference Paper
Full-text available
User-given tags associated with social images from photosharing websites (e.g., Flickr) are valuable auxiliary resources for the image tagging task. However, social images often suffer from noisy and incomplete tags, heavily degrading the effectiveness of previous image tagging approaches. To alleviate the problem, we introduce a Sparse Tag Pattern...
Conference Paper
Full-text available
Compressing a query image's signature via vocabulary coding is an effective approach to low bit rate mobile visual search. State-of-the-art methods concentrate on offline learning a codebook from an initial large vocabulary. Over a large heterogeneous reference database, learning a single codebook may not suffice for maximally removing redundant co...

Network

Cited By