Weilin Huang

Weilin Huang
  • PhD
  • PostDoc Position at University of Oxford

About

34
Publications
18,995
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
4,030
Citations
Current institution
University of Oxford
Current position
  • PostDoc Position
Additional affiliations
September 2012 - May 2013
Adobe Research
Position
  • Research Intern
September 2008 - December 2012
University of Manchester
Position
  • PhD Student

Publications

Publications (34)
Conference Paper
Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Training of two tasks in a unified framework is non-trivial due to significant differences in optimisation difficulties. In this work, we present a conceptually simple yet efficient framework that simultaneously processe...
Article
Full-text available
Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Training of two tasks in a unified framework is non-trivial due to significant dif- ferences in optimisation difficulties. In this work, we present a conceptually simple yet efficient framework that simultaneously proces...
Conference Paper
In this paper, we present a novel Orientation-Aware Text Proposals Network (OA-TPN) for detecting text in the wild. The OA-TPN is able to accurately localize arbitrary-oriented text lines in a natural image. Instead of detecting the whole text line at one time, the OA-TPN detects sequences of small-scale orientation-aware text proposals. To handle...
Article
Full-text available
We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. We propose an attention mechanism which roughly identifies text regions via an automatically learned attentional map. This substantially suppresses background interference in the convolutional features, which is the key to producing accu...
Article
Binary descriptors have received extensive research interests due to their low memory storage and computational efficiency. However, the discriminative ability of the binary descriptors is often limited in comparison with general floating point ones. In this paper, we present a learning framework to effectively integrate multiple binary descriptors...
Article
This paper presents a compact and efficient yet powerful binary framework based on image gradients for robust facial representation. It is termed as Binary Gradient Patterns (BGP). To discover underlying local structures in the gradient domain, image gradients are computed from multiple directions and encoded into a set of binary strings. Certain t...
Article
Image representation and classification are two fundamental tasks toward version understanding. Shape and texture provide two key features for visual representation and have been widely exploited in a number of successful local descriptors, e.g., scale invariant feature transform (SIFT), local binary pattern descriptor, and histogram of oriented gr...
Patent
Full-text available
Methods and apparatus for recognizing text in an image are disclosed. According to an embodiment, the method comprises encoding the image into a first sequence with a convolutional neural network (CNN), wherein the first sequence is an output from a last second convolutional layer of the CNN; decoding the first sequence with a recurrent neural netw...
Patent
Full-text available
Methods and apparatus for recognizing text in an image are disclosed. According to an embodiment, the method comprises encoding the image into a first sequence with a convolutional neural network (CNN), wherein the first sequence is an output from a last second convolutional layer of the CNN; decoding the first sequence with a recurrent neural netw...
Conference Paper
Full-text available
We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposa...
Article
Thanks to the available large-scale scene datasets such as Places and Places2, Convolutional Neural Networks (CNNs) have made remarkable progress on the problem of scene recognition. However, scene categories are often defined according its functions and there exist large intra-class variations in a single scene category. Meanwhile, as the number o...
Preprint
Convolutional Neural Networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale scene datasets, such as the Places and Places2. Scene categories are often defined by multi-level information, including local objects, global layout, and background environment, thus leading to large intra-class variat...
Preprint
We propose a novel Connectionist Text Proposal Network (CTPN) that accurately localizes text lines in natural image. The CTPN detects a text line in a sequence of fine-scale text proposals directly in convolutional feature maps. We develop a vertical anchor mechanism that jointly predicts location and text/non-text score of each fixed-width proposa...
Article
Full-text available
We introduce a new top-down pipeline for scene text detection. We propose a novel Cascaded Convolutional Text Network (CCTN) that joints two customized convolutional networks for coarse-to-fine text localization. The CCTN fast detects text regions roughly from a low-resolution image, and then accurately localizes text lines from each enlarged regio...
Article
Full-text available
Convolutional neural networks (CNN) have recently achieved remarkable successes in various image classification and understanding tasks. The deep features obtained at the top fully-connected layer of the CNN (FC-features) exhibit rich global semantic information and are extremely effective in image classification. On the other hand, the convolution...
Article
Full-text available
Recent deep learning models have demonstrated strong capabilities for classifying text and non-text components in natural images. They extract a high-level feature computed globally from a whole image component (patch), where the cluttered background information may dominate the true text features in the deep representation, leading to less discrim...
Article
Full-text available
VGGNets have turned out to be effective for object recognition in still images. However, it is unable to yield good performance by directly adapting the VGGNet models trained on the ImageNet dataset for scene recognition. This report describes our implementation of training the VGGNets on the large-scale Places205 dataset. Specifically, we train th...
Article
Full-text available
Image representation and classification are two fundamental tasks towards multimedia content retrieval and understanding. The idea that shape and texture information (e.g. edge or orientation) are the key features for visual representation is ingrained and dominated in current multimedia and computer vision communities. A number of low-level featur...
Article
Full-text available
We develop a Deep-Text Recurrent Network (DTRN) that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered high-level sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long sh...
Article
This paper presents a computationally efficient yet powerful binary framework for robust facial representation based on image gradients. It is termed as structural binary gradient patterns (SBGP). To discover underlying local structures in the gradient domain, we compute image gradients from multiple directions and simplify them into a set of binar...
Conference Paper
Full-text available
Maximally Stable Extremal Regions (MSERs) have achieved great success in scene text detection. However, this low-level pixel operation inherently limits its capability for handling complex text information efficiently (e. g. connections between text or background components), leading to the difficulty in distinguishing texts from background compone...
Conference Paper
Recent studies showed that f-divergence based features have achieved great successes in speech recognition, synthesis and dialect classification. This paper proposes a novel local contrastive descriptor for image classification based on the f-divergence, referred as LCD. It extracts local image feature by computing the contrastive characteristic be...
Conference Paper
Full-text available
In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and text line levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating...
Conference Paper
Principal component analysis (PCA) has long been a dominating linear technique for dimensionality reduction. Many nonlinear methods and neural networks have been proposed to extend PCA for complex nonlinear data. They include kernel PCA, local linear embedding, isomap, self-organising map (SOM), and visualization induced SOM (ViSOM), a variant of S...
Conference Paper
Full-text available
Local binary pattern (LBP) has recently been proposed for texture analysis and local feature description and has also been applied to face recognition with promising results. However, besides the descriptors, a suitable similarity measure that can efficiently learn to distinguish facial features is also important. In this paper, a novel framework f...
Article
Dimensionality reduction has long been associated with retinotopic mapping for understanding cortical maps. Multisensory information is processed, fused and mapped to an essentially 2-D cortex in an information preserving manner. Data processing and projection techniques inspired by this biological mechanism are playing an increasingly important ro...
Conference Paper
The curse of dimensionality has prompted intensive research in effective methods of mapping high dimensional data. Dimensionality reduction and subspace learning have been studied extensively and widely applied to feature extraction and pattern representation in image and vision applications. Although PCA has long been regarded as a simple, efficie...
Conference Paper
The self-organizing map (SOM) is a classical neural network method for dimensionality reduction and data visualization. Visualization induced SOM (ViSOM) and growing ViSOM (gViSOM) are two recently proposed variants for a more faithful, metric-based and direct data representation. They learn local quantitative distances of data by regularizing the...
Article
Principal component analysis (PCA) has long been a simple, efficient technique for dimensionality reduction. However, many nonlinear methods such as local linear embedding and curvilinear component analysis have been proposed for increasingly complex nonlinear data recently. In this paper, we investigate and compare linear PCA and various nonlinear...
Conference Paper
Full-text available
Content authentication has become an important issue for surveillance video. This paper presents a watermarking system based on Discrete Cosine Transform (DCT) for Motion-JPEG video authentication. To protect the integrity of the video object, a content-based watermark is embedded into the frames of the video. Robust watermark and semi-fragile wate...

Network

Cited By