Steven C. H. Hoi

Singapore Management University, Tumasik, Singapore

Are you Steven C. H. Hoi?

Claim your profile

Publications (20)23.81 Total impact

  • Steven C. H. Hoi · Xiongwei Wu · Hantang Liu · Yue Wu · Huiqiong Wang · Hui Xue · Qiang Wu
    [Show abstract] [Hide abstract]
    ABSTRACT: Logo detection from images has many applications, particularly for brand recognition and intellectual property protection. Most existing studies for logo recognition and detection are based on small-scale datasets which are not comprehensive enough when exploring emerging deep learning techniques. In this paper, we introduce "LOGO-Net", a large-scale logo image database for logo detection and brand recognition from real-world product images. To facilitate research, LOGO-Net has two datasets: (i)"logos-18" consists of 18 logo classes, 10 brands, and 16,043 logo objects, and (ii) "logos-160" consists of 160 logo classes, 100 brands, and 130,608 logo objects. We describe the ideas and challenges for constructing such a large-scale database. Another key contribution of this work is to apply emerging deep learning techniques for logo detection and brand recognition tasks, and conduct extensive experiments by exploring several state-of-the-art deep region-based convolutional networks techniques for object detection tasks. The LOGO-net will be released at http://logo-net.org/
    No preview · Article · Nov 2015
  • Source
    Dayong Wang · Pengcheng Wu · Peilin Zhao · Yue Wu · Chunyan Miao · Steven C.H. Hoi
    [Show abstract] [Hide abstract]
    ABSTRACT: The amount of data in our society has been exploding in the era of big data today. In this paper, we address several open challenges of big data stream classification, including high volume, high velocity, high dimensionality, and high sparsity. Many existing studies in data mining literature solve data stream classification tasks in a batch learning setting, which suffers from poor efficiency and scalability when dealing with big data. To overcome the limitations, this paper investigates an online learning framework for big data stream classification tasks. Unlike some existing online data stream classification techniques that are often based on first-order online learning, we propose a framework of Sparse Online Classification (SOC) for data stream classification, which includes some state-of-the-art first-order sparse online learning algorithms as special cases and allows us to derive a new effective second-order online learning algorithm for data stream classification. We conduct an extensive set of experiments, in which encouraging results validate the efficacy of the proposed algorithms in comparison to a family of state-of-the-art techniques on a variety of data stream classification tasks.
    Full-text · Article · Jan 2015
  • Ji Wan · Dayong Wang · Steven Chu Hong Hoi · Pengcheng Wu · Jianke Zhu · Yongdong Zhang · Jintao Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Learning effective feature representations and similarity measures are crucial to the retrieval performance of a content-based image retrieval (CBIR) system. Despite extensive research efforts for decades, it remains one of the most challenging open problems that considerably hinders the successes of real-world CBIR systems. The key challenge has been attributed to the well-known "semantic gap" issue that exists between low-level image pixels captured by machines and high-level semantic concepts perceived by human. Among various techniques, machine learning has been actively investigated as a possible direction to bridge the semantic gap in the long term. Inspired by recent successes of deep learning techniques for computer vision and other applications, in this paper, we attempt to address an open problem: if deep learning is a hope for bridging the semantic gap in CBIR and how much improvements in CBIR tasks can be achieved by exploring the state-of-the-art deep learning techniques for learning feature representations and similarity measures. Specifically, we investigate a framework of deep learning with application to CBIR tasks with an extensive set of empirical studies by examining a state-of-the-art deep learning method (Convolutional Neural Networks) for CBIR tasks under varied settings. From our empirical studies, we find some encouraging results and summarize some important insights for future research.
    No preview · Conference Paper · Nov 2014
  • Doyen Sahoo · Steven C.H. Hoi · Bin Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Kernel-based regression represents an important family of learning techniques for solving challenging regression tasks with non-linear patterns. Despite being studied extensively, most of the existing work suffers from two major drawbacks: (i) they are often designed for solving regression tasks in a batch learning setting, making them not only computationally inefficient and but also poorly scalable in real-world applications where data arrives sequentially; and (ii) they usually assume a fixed kernel function is given prior to the learning task, which could result in poor performance if the chosen kernel is inappropriate. To overcome these drawbacks, this paper presents a novel scheme of Online Multiple Kernel Regression (OMKR), which sequentially learns the kernel-based regressor in an online and scalable fashion, and dynamically explore a pool of multiple diverse kernels to avoid suffering from a single fixed poor kernel so as to remedy the drawback of manual/heuristic kernel selection. The OMKR problem is more challenging than regular kernel-based regression tasks since we have to on-the-fly determine both the optimal kernel-based regressor for each individual kernel and the best combination of the multiple kernel regressors. In this paper, we propose a family of OMKR algorithms for regression and discuss their application to time series prediction tasks. We also analyze the theoretical bounds of the proposed OMKR method and conduct extensive experiments to evaluate its empirical performance on both real-world regression and times series prediction tasks.
    No preview · Article · Aug 2014
  • Guangxia Li · Steven C. H. Hoi · Kuiyu Chang · Wenting Liu · Ramesh Jain
    [Show abstract] [Hide abstract]
    ABSTRACT: We study the problem of online multitask learning for solving multiple related classification tasks in parallel, aiming at classifying every sequence of data received by each task accurately and efficiently. One practical example of online multitask learning is the micro-blog sentiment detection on a group of users, which classifies micro-blog posts generated by each user into emotional or non-emotional categories. This particular online learning task is challenging for a number of reasons. First of all, to meet the critical requirements of online applications, a highly efficient and scalable classification solution that can make immediate predictions with low learning cost is needed. This requirement leaves conventional batch learning algorithms out of consideration. Second, classical classification methods, be it batch or online, often encounter a dilemma when applied to a group of tasks, i.e., on one hand, a single classification model trained on the entire collection of data from all tasks may fail to capture characteristics of individual task; on the other hand, a model trained independently on individual tasks may suffer from insufficient training data. To overcome these challenges, in this paper, we propose a collaborative online multitask learning method, which learns a global model over the entire data of all tasks. At the same time, individual models for multiple related tasks are jointly inferred by leveraging the global model through a collaborative online learning approach. We illustrate the efficacy of the proposed technique on a synthetic dataset. We also evaluate it on three real-life problems—spam email filtering, bioinformatics data classification, and micro-blog sentiment detection. Experimental results show that our method is effective and scalable at the online classification of multiple related tasks.
    No preview · Article · Aug 2014 · IEEE Transactions on Knowledge and Data Engineering
  • Steven C. H. Hoi · Jialei Wang · Peilin Zhao
    [Show abstract] [Hide abstract]
    ABSTRACT: LIBOL is an open-source library for large-scale online learning, which consists of a large family of efficient and scalable state-of-the-art online learning algorithms for large-scale online classification tasks. We have offered easy-to-use command-line tools and examples for users and developers, and also have made comprehensive documents available for both beginners and advanced users. LIBOL is not only a machine learning toolbox, but also a comprehensive experimental platform for conducting online learning research.
    No preview · Article · Feb 2014 · Journal of Machine Learning Research
  • Source
    Dayong Wang · S.C.H. Hoi · Ying He · Jianke Zhu
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper investigates a framework of search-based face annotation (SBFA) by mining weakly labeled facial images that are freely available on the World Wide Web (WWW). One challenging problem for search-based face annotation scheme is how to effectively perform annotation by exploiting the list of most similar facial images and their weak labels that are often noisy and incomplete. To tackle this problem, we propose an effective unsupervised label refinement (ULR) approach for refining the labels of web facial images using machine learning techniques. We formulate the learning problem as a convex optimization and develop effective optimization algorithms to solve the large-scale learning task efficiently. To further speed up the proposed scheme, we also propose a clustering-based approximation algorithm which can improve the scalability considerably. We have conducted an extensive set of empirical studies on a large-scale web facial image testbed, in which encouraging results showed that the proposed ULR algorithms can significantly boost the performance of the promising SBFA scheme.
    Full-text · Article · Jan 2014 · IEEE Transactions on Knowledge and Data Engineering
  • Jialei Wang · Steven C. H. Hoi · Peilin Zhao · Jinfeng Zhuang · Zhi-Yong Liu
    [Show abstract] [Hide abstract]
    ABSTRACT: In this work, we present a new framework for large scale online kernel classification, making kernel methods efficient and scalable for large-scale online learning tasks. Unlike the regular budget kernel online learning scheme that usually uses different strategies to bound the number of support vectors, our framework explores a functional approximation approach to approximating a kernel function/matrix in order to make the subsequent online learning task efficient and scalable. Specifically, we present two different online kernel machine learning algorithms: (i) the Fourier Online Gradient Descent (FOGD) algorithm that applies the random Fourier features for approximating kernel functions; and (ii) the Nyström Online Gradient Descent (NOGD) algorithm that applies the Nyström method to approximate large kernel matrices. We offer theoretical analysis of the proposed algorithms, and conduct experiments for large-scale online classification tasks with some data set of over 1 million instances. Our encouraging results validate the effectiveness and efficiency of the proposed algorithms, making them potentially more practical than the family of existing budget kernel online learning approaches.
    No preview · Conference Paper · Aug 2013
  • Dingjiang Huang · Junlong Zhou · Bin Li · Steven C. H. Hoi · Shuigeng Zhou
    [Show abstract] [Hide abstract]
    ABSTRACT: On-line portfolio selection has been attracting increasing interests from artificial intelligence community in recent decades. Mean reversion, as one most frequent pattern in financial markets, plays an important role in some state-of-the-art strategies. Though successful in certain datasets, existing mean reversion strategies do not fully consider noises and outliers in the data, leading to estimation error and thus non-optimal portfolios, which results in poor performance in practice. To overcome the limitation, we propose to exploit the reversion phenomenon by robust L1-median estimator, and design a novel on-line portfolio selection strategy named "Robust Median Reversion" (RMR), which makes optimal portfolios based on the improved reversion estimation. Empirical results on various real markets show that RMR can overcome the drawbacks of existing mean reversion algorithms and achieve significantly better results. Finally, RMR runs in linear time, and thus is suitable for large-scale trading applications.
    No preview · Conference Paper · Aug 2013
  • Source
    Dayong Wang · Steven C.H. Hoi · Pengcheng Wu · Jianke Zhu · Ying He · Chunyan Miao
    [Show abstract] [Hide abstract]
    ABSTRACT: Automated face annotation aims to automatically detect human faces from a photo and further name the faces with the corresponding human names. In this paper, we tackle this open problem by investigating a search-based face annotation (SBFA) paradigm for mining large amounts of web facial images freely available on the WWW. Given a query facial image for annotation, the idea of SBFA is to first search for top-n similar facial images from a web facial image database and then exploit these top-ranked similar facial images and their weak labels for naming the query facial image. To fully mine those information, this paper proposes a novel framework of Learning to Name Faces (L2NF) -- a unified multimodal learning approach for search-based face annotation, which consists of the following major components: (i) we enhance the weak labels of top-ranked similar images by exploiting the "label smoothness" assumption; (ii) we construct the multimodal representations of a facial image by extracting different types of features; (iii) we optimize the distance measure for each type of features using distance metric learning techniques; and finally (iv) we learn the optimal combination of multiple modalities for annotation through a learning to rank scheme. We conduct a set of extensive empirical studies on two real-world facial image databases, in which encouraging results show that the proposed algorithms significantly boost the naming accuracy of search-based face annotation task.
    Full-text · Conference Paper · Jul 2013
  • Source
    Dayong Wang · Steven Chu Hong Hoi · Ying He
    [Show abstract] [Hide abstract]
    ABSTRACT: Auto face annotation plays an important role in many real-world multimedia information and knowledge management systems. Recently there is a surge of research interests in mining weakly-labeled facial images on the internet to tackle this long-standing research challenge in computer vision and image understanding. In this paper, we present a novel unified learning framework for face annotation by mining weakly labeled web facial images through interdisciplinary efforts of combining sparse feature representation, content-based image retrieval, transductive learning and inductive learning techniques. In particular, we first introduce a new search-based face annotation paradigm using transductive learning, and then propose an effective inductive learning scheme for training classification-based annotators from weakly labeled facial images, and finally unify both transductive and inductive learning approaches to maximize the learning efficacy. We conduct extensive experiments on a real-world web facial image database, in which encouraging results show that the proposed unified learning scheme outperforms the state-of-the-art approaches.
    Full-text · Conference Paper · Oct 2012
  • Source
    Jiazhi Xia · Dao Thi Phuong Quynh · Ying He · Xiaoming Chen · Steven C. H. Hoi
    [Show abstract] [Hide abstract]
    ABSTRACT: In this paper, we present a novel geometry video (GV) framework to model and compress 3-D facial expressions. GV bridges the gap of 3-D motion data and 2-D video, and provides a natural way to apply the well-studied video processing techniques to motion data processing. Our framework includes a set of algorithms to construct GVs, such as hole filling, geodesic-based face segmentation, expression-invariant parameterization (EIP), and GV compression. Our EIP algorithm can guarantee the exact correspondence of the salient features (eyes, mouth, and nose) in different frames, which leads to GVs with better spatial and temporal coherence than that of the conventional parameterization methods. By taking advantage of this feature, we also propose a new H.264/AVC-based progressive directional prediction scheme, which can provide further 10%-16% bitrate reductions compared to the original H.264/AVC applied for GV compression while maintaining good video quality. Our experimental results on real-world datasets demonstrate that GV is very effective for modeling the high-resolution 3-D expression data, thus providing an attractive way in expression information processing for gaming and movie industry.
    Full-text · Article · Feb 2012 · IEEE Transactions on Circuits and Systems for Video Technology
  • Boon-Seng Chew · Lap-Pui Chau · Ying He · Dayong Wang · Steven C. H. Hoi
    [Show abstract] [Hide abstract]
    ABSTRACT: The use of 3D models for progressive transmission and broadcasting applications is an interesting challenge due to the nature and complexity of such content. In this paper, a new image format for the representation of 3D progressive model is proposed. The powerful spectral analysis is combined with the state of art Geometry Image(GI) to encode static 3D models into spectral geometry images(SGI) for robust 3D shape representation. Based on the 3D model's surface characteristics, SGI separated the geometrical image into low and high frequency layers to achieve effective Level of Details(LOD) modeling. For SGI, the connectivity data of the model is implicitly encoded in the image, thus removing the need for additional channel bits allocated for its protection during transmission. We demonstrated that by coupling SGI together with an efficient channel allocation scheme, an image based method for 3D representation suitable for adoption in conventional broadcasting standard is proposed. The proposed framework is effective in ensuring the smooth degradation of progressive 3D models across varying channel bandwidths and packet loss conditions.
    No preview · Article · Oct 2011 · IEEE Transactions on Broadcasting
  • Lei Wu · Steven C.H. Hoi
    [Show abstract] [Hide abstract]
    ABSTRACT: The authors present an online semantics preserving, metric learning technique for improving the bag-of-words model and addressing the semantic-gap issue. This article investigates the challenge of reducing the semantic gap for building BoW models for image representation; propose a novel OSPML algorithm for enhancing BoW by minimizing the semantic loss, which is efficient and scalable for enhancing BoW models for large-scale applications; apply the proposed technique for large-scale image annotation and object recognition; and compare it to the state of the art.
    No preview · Article · Feb 2011 · IEEE Multimedia
  • Steven C H Hoi · Rong Jin · Michael R. Lyu
    [Show abstract] [Hide abstract]
    ABSTRACT: Most machine learning tasks in data classification and information retrieval require manually labeled data examples in the training stage. The goal of active learning is to select the most informative examples for manual labeling in these learning tasks. Most of the previous studies in active learning have focused on selecting a single unlabeled example in each iteration. This could be inefficient, since the classification model has to be retrained for every acquired labeled example. It is also inappropriate for the setup of information retrieval tasks where the user's relevance feedback is often provided for the top K retrieved items. In this paper, we present a framework for batch mode active learning, which selects a number of informative examples for manual labeling in each iteration. The key feature of batch mode active learning is to reduce the redundancy among the selected examples such that each example provides unique information for model updating. To this end, we employ the Fisher information matrix as the measurement of model uncertainty, and choose the set of unlabeled examples that can efficiently reduce the Fisher information of the classification model. We apply our batch mode active learning framework to both text categorization and image retrieval. Promising results show that our algorithms are significantly more effective than the active learning approaches that select unlabeled examples based only on their informativeness for the classification model.
    No preview · Article · Sep 2009 · IEEE Transactions on Knowledge and Data Engineering
  • Jianke Zhu · S.C.H. Hoi · M.R. Lyu
    [Show abstract] [Hide abstract]
    ABSTRACT: Most state-of-the-art nonrigid shape recovery methods usually use explicit deformable mesh models to regularize surface deformation and constrain the search space. These triangulated mesh models heavily relying on the quadratic regularization term are difficult to accurately capture large deformations, such as severe bending. In this paper, we propose a novel Gaussian process regression approach to the nonrigid shape recovery problem, which does not require to involve a predefined triangulated mesh model. By taking advantage of our novel Gaussian process regression formulation together with a robust coarse-to-fine optimization scheme, the proposed method is fully automatic and is able to handle large deformations and outliers. We conducted a set of extensive experiments for performance evaluation in various environments. Encouraging experimental results show that our proposed approach is both effective and robust to nonrigid shape recovery with large deformations.
    No preview · Conference Paper · Jun 2009
  • Source
    Jianke Zhu · Steven C.H. Hoi · Michael R. Lyu
    [Show abstract] [Hide abstract]
    ABSTRACT: Face annotation in images and videos enjoys many potential applications in multimedia information retrieval. Face annotation usually requires many training data labeled by hand in order to build effective classifiers. This is particularly challenging when annotating faces on large-scale collections of media data, in which huge labeling efforts would be very expensive. As a result, traditional supervised face annotation methods often suffer from insufficient training data. To attack this challenge, in this paper, we propose a novel Transductive Kernel Fisher Discriminant (TKFD) scheme for face annotation, which outperforms traditional supervised annotation methods with few training data. The main idea of our approach is to solve the Fisher's discriminant using deformed kernels incorporating the information of both labeled and unlabeled data. To evaluate the effectiveness of our method, we have conducted extensive experiments on three types of multimedia testbeds: the FRGC benchmark face dataset, the Yahoo! web image collection, and the TRECVID video data collection. The experimental results show that our TKFD algorithm is more effective than traditional supervised approaches, especially when there are very few training data.
    Full-text · Article · Feb 2008 · IEEE Transactions on Multimedia
  • Source
    Yuk Man Wong · S.C.H. Hoi · M.R. Lyu
    [Show abstract] [Hide abstract]
    ABSTRACT: One key challenge in content-based image retrieval (CBIR) is to develop a fast solution for indexing high-dimensional image contents, which is crucial to building large-scale CBIR systems. In this paper, we propose a scalable content-based image retrieval scheme using locality-sensitive hashing (LSH), and conduct extensive evaluations on a large image testbed of a half million images. To the best of our knowledge, there is less comprehensive study on large-scale CBIR evaluation with a half million images. Our empirical results show that our proposed solution is able to scale for hundreds of thousands of images, which is promising for building Web-scale CBIR systems.
    Full-text · Conference Paper · Aug 2007
  • Source
    S.C.H. Hoi · M.R. Lyu
    [Show abstract] [Hide abstract]
    ABSTRACT: One critical task in content-based video retrieval is to rank search results with combinations of multimodal resources effectively. This paper proposes a novel multimodal and multilevel ranking framework for content-based video retrieval. The main idea of our approach is to represent videos by graphs and learn harmonic ranking functions through fusing multimodal resources over these graphs smoothly. We further tackle the efficiency issue by a multilevel learning scheme, which makes the semi-supervised ranking method practical for large-scale applications. Our empirical evaluations on TRECVID 2005 dataset show that the proposed multimodal and multilevel ranking framework is effective and promising for content-based video retrieval
    Full-text · Conference Paper · May 2007
  • Source
    Steven C H Hoi · Michael R. Lyu · Rong Jin
    [Show abstract] [Hide abstract]
    ABSTRACT: Relevance feedback has emerged as a powerful tool to boost the retrieval performance in content-based image retrieval (CBIR). In the past, most research efforts in this field have focused on designing effective algorithms for traditional relevance feedback. Given that a CBIR system can collect and store users' relevance feedback information in a history log, an image retrieval system should be able to take advantage of the log data of users' feedback to enhance its retrieval performance. In this paper, we propose a unified framework for log-based relevance feedback that integrates the log of feedback data into the traditional relevance feedback schemes to learn effectively the correlation between low-level image features and high-level concepts. Given the error-prone nature of log data, we present a novel learning technique, named soft label support vector machine, to tackle the noisy data problem. Extensive experiments are designed and conducted to evaluate the proposed algorithms based on the COREL image data set. The promising experimental results validate the effectiveness of our log-based relevance feedback scheme empirically.
    Full-text · Article · May 2006 · IEEE Transactions on Knowledge and Data Engineering