Qiang Yang

Tsinghua University, Peping, Beijing, China

Are you Qiang Yang?

Claim your profile

Publications (420)248.61 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Time-sync video tagging aims to automatically generate tags for each video shot. It can improve the user's experience in previewing a video's timeline structure compared to traditional schemes that tag an entire video clip. In this paper, we propose a new application which extracts time-sync video tags by automatically exploiting crowdsourced comments from video websites such as Nico Nico Douga, where videos are commented on by online crowd users in a time-sync manner. The challenge of the proposed application is that users with bias interact with one another frequently and bring noise into the data, while the comments are too sparse to compensate for the noise. Previous techniques are unable to handle this task well as they consider video semantics independently, which may overfit the sparse comments in each shot and thus fail to provide accurate modeling. To resolve these issues, we propose a novel temporal and personalized topic model that jointly considers temporal dependencies between video semantics, users' interaction in commenting, and users' preferences as prior knowledge. Our proposed model shares knowledge across video shots via users to enrich the short comments, and peels off user interaction and user bias to solve the noisy-comment problem. Log-likelihood analyses and user studies on large datasets show that the proposed model outperforms several state-of-the-art baselines in video tagging quality. Case studies also demonstrate our model's capability of extracting tags from the crowdsourced short and noisy comments.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Transfer learning is established as an effective technology to leverage rich labeled data from some source domain to build an accurate classifier for the target domain. The basic assumption is that the input domains may share certain knowledge structure, which can be encoded into common latent factors and extracted by preserving important property of original data, e.g., statistical property and geometric structure. In this paper, we show that different properties of input data can be complementary to each other and exploring them simultaneously can make the learning model robust to the domain difference. We propose a general framework, referred to as Graph Co-Regularized Transfer Learning (GTL), where various matrix factorization models can be incorporated. Specifically, GTL aims to extract common latent factors for knowledge transfer by preserving the statistical property across domains, and simultaneously, refine the latent factors to alleviate negative transfer by preserving the geometric structure in each domain. Based on the framework, we propose two novel methods using NMF and NMTF, respectively. Extensive experiments verify that GTL can significantly outperform state-of-the-art learning methods on several public text and image datasets.
    IEEE Transactions on Knowledge and Data Engineering 07/2014; 26(7):1805-1818. · 1.82 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Advanced satellite tracking technologies have collected huge amounts of wild bird migration data. Biologists use these data to understand dynamic migration patterns, study correlations between habitats, and predict global spreading trends of avian influenza. The research discussed here transforms the biological problem into a machine learning problem by converting wild bird migratory paths into graphs. H5N1 outbreak prediction is achieved by discovering weighted closed cliques from the graphs using the mining algorithm High-wEight cLosed cliquE miNing (HELEN). The learning algorithm HELEN-p then predicts potential H5N1 outbreaks at habitats. This prediction method is more accurate than traditional methods used on a migration dataset obtained through a real satellite bird-tracking system. Empirical analysis shows that H5N1 spreads in a manner of high-weight closed cliques and frequent cliques.
    Intelligent Systems, IEEE 07/2014; 29(4):10-17. · 1.92 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Hierarchical Task Network (HTN) planning is an effective yet knowledge intensive problem-solving technique. It requires humans to encode knowledge in the form of methods and action models. Methods describe how to decompose tasks into subtasks and the preconditions under which those methods are applicable whereas action models describe how actions change the world. Encoding such knowledge is a difficult and time-consuming process, even for domain experts. In this paper, we propose a new learning algorithm, called HTNLearn, to help acquire HTN methods and action models. HTNLearn receives as input a collection of plan traces with partially annotated intermediate state information, and a set of annotated tasks that specify the conditions before and after the tasks' completion. In addition, plan traces are annotated with potentially empty partial decomposition trees that record the processes of decomposing tasks to subtasks. HTNLearn outputs are a collection of methods and action models. HTNLearn first encodes constraints about the methods and action models as a constraint satisfaction problem, and then solves the problem using a weighted MAX-SAT solver. HTNLearn can learn methods and action models simultaneously from partially observed plan traces (i.e., plan traces where the intermediate states are partially observable). We test HTNLearn in several HTN domains. The experimental results show that our algorithm HTNLearn is both effective and efficient.
    Artificial Intelligence 07/2014; · 2.71 Impact Factor
  • Hankz Hankui Zhuo, Qiang Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: Applying learning techniques to acquire action models is an area of intense research interest. Most previous work in this area has assumed that there is a significant amount of training data available in a planning domain of interest. However, it is often difficult to acquire sufficient training data to ensure the learnt action models are of high quality. In this paper, we seek to explore a novel algorithm framework, called TRAMP, to learn action models with limited training data in a target domain, via transferring as much of the available information from other domains (called source domains) as possible to help the learning task, assuming action models in source domains can be transferred to the target domain. TRAMP transfers knowledge from source domains by first building structure mappings between source and target domains, and then exploiting extra knowledge from Web search to bridge and transfer knowledge from sources. Specifically, TRAMP first encodes training data with a set of propositions, and formulates the transferred knowledge as a set of weighted formulas. After that it learns action models for the target domain to best explain the set of propositions and the transferred knowledge. We empirically evaluate TRAMP in different settings to see their advantages and disadvantages in six planning domains, including four International Planning Competition (IPC) domains and two synthetic domains.
    Artificial Intelligence 07/2014; · 2.71 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Transfer learning, which aims to help learning tasks in a target domain by leveraging knowledge from auxiliary domains, has been demonstrated to be effective in different applications such as text mining, sentiment analysis, and so on. In addition, in many real-world applications, auxiliary data are described from multiple perspectives and usually carried by multiple sources. For example, to help classify videos on Youtube, which include three perspectives: image, voice and subtitles, one may borrow data from Flickr, Last.FM and Google News. Although any single instance in these domains can only cover a part of the views available on Youtube, the piece of information carried by them may compensate one another. If we can exploit these auxiliary domains in a collective manner, and transfer the knowledge to the target domain, we can improve the target model building from multiple perspectives. In this article, we consider this transfer learning problem as Transfer Learning with Multiple Views and Multiple Sources. As different sources may have different probability distributions and different views may compensate or be inconsistent with each other, merging all data in a simplistic manner will not give an optimal result. Thus, we propose a novel algorithm to leverage knowledge from different views and sources collaboratively, by letting different views from different sources complement each other through a co-training style framework, at the same time, it revises the distribution differences in different domains. We conduct empirical studies on several real-world datasets to show that the proposed approach can improve the classification accuracy by up to 8% against different kinds of state-of-the-art baselines.
    Statistical Analysis and Data Mining 04/2014;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Friendship prediction is an important task in social network analysis (SNA). It can help users identify friends and improve their level of activity. Most previous approaches predict users' friendship based on their historical records, such as their existing friendship, social interactions, etc. However, in reality, most users have limited friends in a single network, and the data can be very sparse. The sparsity problem causes existing methods to overfit the rare observations and suffer from serious performance degradation. This is particularly true when a new social network just starts to form. We observe that many of today's social networks are composite in nature, where people are often engaged in multiple networks. In addition, users' friendships are always correlated, for example, they are both friends on Facebook and Google+. Thus, by considering those overlapping users as the bridge, the friendship knowledge in other networks can help predict their friendships in the current network. This can be achieved by exploiting the knowledge in different networks in a collective manner. However, as each individual network has its own properties that can be incompatible and inconsistent with other networks, the naive merging of all networks into a single one may not work well. The proposed solution is to extract the common behaviors between different networks via a hierarchical Bayesian model. It captures the common knowledge across networks, while avoiding negative impacts due to network differences. Empirical studies demonstrate that the proposed approach improves the mean average precision of friendship prediction over state-of-the-art baselines on nine real-world social networking datasets significantly.
  • Erheng Zhong, Wei Fan, Qiang Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: Accurate prediction of user behaviors is important for many social media applications, including social marketing, personalization, and recommendation. A major challenge lies in that although many previous works model user behavior from only historical behavior logs, the available user behavior data or interactions between users and items in a given social network are usually very limited and sparse (e.g., ⩾ 99.9% empty), which makes models overfit the rare observations and fail to provide accurate predictions. We observe that many people are members of several social networks in the same time, such as Facebook, Twitter, and Tencent’s QQ. Importantly, users’ behaviors and interests in different networks influence one another. This provides an opportunity to leverage the knowledge of user behaviors in different networks by considering the overlapping users in different networks as bridges, in order to alleviate the data sparsity problem, and enhance the predictive performance of user behavior modeling. Combining different networks “simply and naively” does not work well. In this article, we formulate the problem to model multiple networks as “adaptive composite transfer” and propose a framework called ComSoc. ComSoc first selects the most suitable networks inside a composite social network via a hierarchical Bayesian model, parameterized for individual users. It then builds topic models for user behavior prediction using both the relationships in the selected networks and related behavior data. With different relational regularization, we introduce different implementations, corresponding to different ways to transfer knowledge from composite social relations. To handle big data, we have implemented the algorithm using Map/Reduce. We demonstrate that the proposed composite network-based user behavior models significantly improve the predictive accuracy over a number of existing approaches on several real-world applications, including a very large social networking dataset from Tencent Inc.
    ACM Transactions on Knowledge Discovery from Data 02/2014; 8(1). · 1.15 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present in this paper our winning solution to Dedicated Task 1 in Nokia Mobile Data Challenge (MDC). MDC Task 1 is to infer the semantic category of a place based on the smartphone sensing data obtained at that place. We approach this task in a standard supervised learning setting: we extract discriminative features from the sensor data and use state-of-the-art classifiers (SVM, Logistic Regression and Decision Tree Family) to build classification models. We have found that feature engineering, or in other words, constructing features using human heuristics, is very effective for this task. In particular, we have proposed a novel feature engineering technique, Conditional Feature (CF), a general framework for domain-specific feature construction. In total, we have generated 2,796,200 features and in our final five submissions we use feature selection to select 100 to 2000 features. One of our key findings is that features conditioned on fine-granularity time intervals, e.g. every 30 min, are most effective. Our best 10-fold CV accuracy on training set is 75.1% by Gradient Boosted Trees, and the second best accuracy is 74.6% by L1-regularized Logistic Regression. Besides the good performance, we also report briefly our experience of using F# language for large-scale (∼70 GB raw text data) conditional feature construction.
    Pervasive and Mobile Computing 12/2013; 9(6):772–783. · 1.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: Demographics prediction is an important component of user profile modeling. The accurate prediction of users’ demographics can help promote many applications, ranging from web search, personalization to behavior targeting. In this paper, we focus on how to predict users’ demographics, including “gender”, “job type”, “marital status”, “age” and “number of family members”, based on mobile data, such as users’ usage logs, physical activities and environmental contexts. The core idea is to build a supervised learning framework, where each user is represented as a feature vector and users’ demographics are considered as prediction targets. The most important component is to construct features from raw data and then supervised learning models can be applied. We propose a feature construction framework, CFC (contextual feature construction), where each feature is defined as the conditional probability of one user activity under the given contexts. Consequently, besides employing standard supervised learning models, we propose a regularized multi-task learning framework to model different kinds of demographics predictions collectively. We also propose a cost-sensitive classification framework for regression tasks, in order to benefit from the existing dimension reduction methods. Finally, due to the limited training instances, we employ ensemble to avoid overfitting. The experimental results show that the framework achieves classification accuracies on “gender”, “job” and “marital status” as high as 96%, 83% and 86%, respectively, and achieves Root Mean Square Error (RMSE) on “age” and “number of family members” as low as 0.69 and 0.66 respectively, under the leave-one-out evaluation.
    Pervasive and Mobile Computing 12/2013; 9(6):823–837. · 1.67 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The study of users' social behaviors has gained much research attention since the advent of various social media such as Facebook, Renren and Twitter. A major kind of applications is to predict a user's future activities based on his/her historical social behaviors. In this paper, we focus on a fundamental task: to predict a user's future activity levels in a social network, e.g. weekly activeness, active or inactive. This problem is closely related to Social Customer Relationship Management (Social CRM). Compared to traditional CRM, the three properties: user diversity, social influence, and dynamic nature of social networks, raise new challenges and opportunities to Social CRM. Firstly, the user diversity property implies that a global predictive model may not be precise for all users. On the other hand, historical data of individual users are too sparse to build precisely personalized models. Secondly, the social influence property suggests that relationships between users can be embedded to further boost prediction results on individual users. Finally, the dynamical nature of social networks means that users' behaviors may keep changing over time. To address these challenges, we develop a personalized and social regularized time-decay model for user activity level prediction. Experiments on the social media Renren validate the effectiveness of our proposed model compared with some baselines including traditional supervised learning methods and node classification methods in social networks.
    Proceedings of the 22nd ACM international conference on Conference on information & knowledge management; 10/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: We present the ideas and methodologies that we used to address the KDD Cup 2013 challenge on author-paper identification. We firstly formulate the problem as a personalized ranking task and then propose to solve the task through a supervised learning framework. The key point is to eliminate those incorrectly assigned papers of a given author based on existing records. We choose Gradient Boosted Tree as our main classifier. Through our exploration we conclude that the most critical factor to achieve our results is the effective feature engineering. In this paper, we formulate this process as a unified framework that constructs features based on contextual information and combines machine learning techniques with human intelligence. Besides this, we suggest several strategies to parse authors' names, which improve the prediction results significantly. Divide-conquer based model building as well as the model averaging techniques also benefit the prediction precision.
    Proceedings of the 2013 KDD Cup 2013 Workshop; 08/2013
  • Conference Paper: Cross-task crowdsourcing
    Kaixiang Mo, Erheng Zhong, Qiang Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: Crowdsourcing is an effective method for collecting labeled data for various data mining tasks. It is critical to ensure the veracity of the produced data because responses collected from different users may be noisy and unreliable. Previous works solve this veracity problem by estimating both the user ability and question difficulty based on the knowledge in each task individually. In this case, each single task needs large amounts of data to provide accurate estimations. However, in practice, budgets provided by customers for a given target task may be limited, and hence each question can be presented to only a few users where each user can answer only a few questions. This data sparsity problem can cause previous approaches to perform poorly due to the overfitting problem on rare data and eventually damage the data veracity. Fortunately, in real-world applications, users can answer questions from multiple historical tasks. For example, one can annotate images as well as label the sentiment of a given title. In this paper, we employ transfer learning, which borrows knowledge from auxiliary historical tasks to improve the data veracity in a given target task. The motivation is that users have stable characteristics across different crowdsourcing tasks and thus data from different tasks can be exploited collectively to estimate users' abilities in the target task. We propose a hierarchical Bayesian model, TLC (Transfer Learning for Crowdsourcing), to implement this idea by considering the overlapping users as a bridge. In addition, to avoid possible negative impact, TLC introduces task-specific factors to model task differences. The experimental results show that TLC significantly improves the accuracy over several state-of-the-art non-transfer-learning approaches under very limited budget in various labeling tasks.
    Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining; 08/2013
  • Erheng Zhong, Wei Fan, Yin Zhu, Qiang Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: Modeling the dynamics of online social networks over time not only helps us understand the evolution of network structures and user behaviors, but also improves the performance of other analysis tasks, such as link prediction and community detection. Nowadays, users engage in multiple networks and form a "composite social network" by considering common users as the bridge. State-of-the-art network-dynamics analysis is performed in isolation for individual networks, but users' interactions in one network can influence their behaviors in other networks, and in an individual network, different types of user interactions also affect each other. Without considering the influences across networks, one may not be able to model the dynamics in a given network correctly due to the lack of information. In this paper, we study the problem of modeling the dynamics of composite networks, where the evolution processes of different networks are jointly considered. However, due to the difference in network properties, simply merging multiple networks into a single one is not ideal because individual evolution patterns may be ignored and network differences may bring negative impacts. The proposed solution is a nonparametric Bayesian model, which models each user's common latent features to extract the cross-network influences, and use network-specific factors to describe different networks' evolution patterns. Empirical studies on large-scale dynamic composite social networks demonstrate that the proposed approach improves the performance of link prediction over several state-of-the-art baselines and unfolds the network evolution accurately.
    Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining; 08/2013
  • Weike Pan, Qiang Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: A major challenge for collaborative filtering (CF) techniques in recommender systems is the data sparsity that is caused by missing and noisy ratings. This problem is even more serious for CF domains where the ratings are expressed numerically, e.g. as 5-star grades. We assume the 5-star ratings are unordered bins instead of ordinal relative preferences. We observe that, while we may lack the information in numerical ratings, we sometimes have additional auxiliary data in the form of binary ratings. This is especially true given that users can easily express themselves with their preferences expressed as likes or dislikes for items. In this paper, we explore how to use these binary auxiliary preference data to help reduce the impact of data sparsity for CF domains expressed in numerical ratings. We solve this problem by transferring the rating knowledge from some auxiliary data source in binary form (that is, likes or dislikes), to a target numerical rating matrix.In particular, our solution is to model both the numerical ratings and ratings expressed as like or dislike in a principled way. We present a novel framework of Transfer by Collective Factorization (TCF), in which we construct a shared latent space collectively and learn the data-dependent effect separately. A major advantage of the TCF approach over the previous bilinear method of collective matrix factorization is that we are able to capture the data-dependent effect when sharing the data-independent knowledge. This allows us to increase the overall quality of knowledge transfer. We present extensive experimental results to demonstrate the effectiveness of TCF at various sparsity levels, and show improvements of our approach as compared to several state-of-the-art methods.
    Artificial Intelligence 04/2013; 197:39–55. · 2.71 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Lifelong Machine Learning, or LML, considers sys-tems that can learn many tasks from one or more do-mains over its lifetime. The goal is to sequentially re-tain learned knowledge and to selectively transfer that knowledge when learning a new task so as to develop more accurate hypotheses or policies. Following a re-view of prior work on LML, we propose that it is now appropriate for the AI community to move beyond learning algorithms to more seriously consider the na-ture of systems that are capable of learning over a life-time. Reasons for our position are presented and poten-tial counter-arguments are discussed. The remainder of the paper contributes by defining LML, presenting a ref-erence framework that considers all forms of machine learning, and listing several key challenges for and ben-efits from LML research. We conclude with ideas for next steps to advance the field.
    AAAI 2013 Spring Symposium on Lifelong Machine Learning; 03/2013
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: BACKGROUND: The detection of epistasis among genetic markers is of great interest in genome-wide association studies (GWAS). In recent years, much research has been devoted to find disease associated epistasis in GWAS. However, due to the high computational cost involved, most methods focus on specific epistasis models, making the potential loss of power when the underlying epistasis models are not examined in these analyses. RESULTS: In this work, we propose a computational efficient approach based on complete enumeration of two-locus epistasis models. This approach uses a two-stage (screening and testing) search strategy and the enumeration of all epistasis patterns is guaranteed. The implementation is done on graphic processing units (GPU), which can finish the analysis on a GWAS data (with around 5, 000 subjects and around 350, 000 markers) within two hours. CONCLUSIONS: This work demonstrates that the complete compositional epistasis detection is computationally feasible in GWAS.
    BMC Genetics 02/2013; 14(1):7. · 2.36 Impact Factor
  • Qiang Yang
    [Show abstract] [Hide abstract]
    ABSTRACT: A major challenge in today's world is the Big Data problem, which manifests itself in Web and Mobile domains as rapidly changing and heterogeneous data streams. A data-mining system must be able to cope with the influx of changing data in a continual manner. This calls for Lifelong Machine Learning, which in contrast to the traditional one-shot learning, should be able to identify the learning tasks at hand and adapt to the learning problems in a sustainable manner. A foundation for lifelong machine learning is transfer learning, whereby knowledge gained in a related but different domain may be transferred to benefit learning for a current task. To make effective transfer learning, it is important to maintain a continual and sustainable channel in the life time of a user in which the data are annotated. In this talk, I outline the lifelong machine learning situations, give several examples of transfer learning and applications for lifelong machine learning, and discuss cases of successful extraction of data annotations to meet the Big Data challenge.
    Proceedings of the sixth ACM international conference on Web search and data mining; 02/2013
  • [Show abstract] [Hide abstract]
    ABSTRACT: Genome-wide association study (GWAS) has been successful in identifying genetic variants that are associated with complex human diseases. In GWAS, multilocus association analyses through linkage disequilibrium (LD), named haplotype-based analyses, may have greater power than single-locus analyses for detecting disease susceptibility loci. However, the large number of SNPs genotyped in GWAS poses great computational challenges in the detection of haplotype associations. We present a fast method named HapBoost for finding haplotype associations, which can be applied to quickly screen the whole genome. The effectiveness of HapBoost is demonstrated by using both synthetic and real data sets. The experimental results show that the proposed approach can achieve comparably accurate results while it performs much faster than existing methods.
    IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM 01/2013; 10(1):207-212. · 2.25 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: The Linking Open Data (LOD) project is an ongoing effort to construct a global data space, i.e. the Web of Data. One important part of this project is to establish owl:sameAs links among structured data sources. Such links indicate equivalent instances that refer to the same real-world object. The problem of discovering owl:sameAs links between pairwise data sources is called instance matching. Most of the existing approaches addressing this problem rely on the quality of prior schema matching, which is not always good enough in the LOD scenario. In this paper, we propose a schema-independent instance-pair similarity metric based on several general descriptive features. We transform the instance matching problem to the binary classification problem and solve it by machine learning algorithms. Furthermore, we employ some transfer learning methods to utilize the existing owl:sameAs links in LOD to reduce the demand for labeled data. We carry out experiments on some datasets of OAEI2010. The results show that our method performs well on real-world LOD data and outperforms the participants of OAEI2010.
    Proceedings of the 11th international conference on The Semantic Web - Volume Part I; 11/2012

Publication Stats

9k Citations
248.61 Total Impact Points


  • 2000–2014
    • Tsinghua University
      • School of Software
      Peping, Beijing, China
  • 1970–2014
    • The Hong Kong University of Science and Technology
      • • Department of Computer Science and Engineering
      • • Department of Electronic and Computer Engineering
      Chiu-lung, Kowloon City, Hong Kong
  • 2013
    • Hong Kong Baptist University
      • Department of Computer Science
      Chiu-lung, Kowloon City, Hong Kong
  • 2012
    • Microsoft
      Washington, West Virginia, United States
  • 2003–2012
    • The University of Hong Kong
      • Department of Computer Science
      Hong Kong, Hong Kong
  • 2011
    • Southeast University (China)
      • School of Computer Science and Engineering
      Nanjing, Jiangxi Sheng, China
  • 2005–2011
    • Institute for Infocomm Research
      Tumasik, Singapore
  • 2010
    • IBM
      Armonk, New York, United States
    • Stanford University
      Palo Alto, California, United States
  • 2001–2010
    • Shanghai Jiao Tong University
      • Department of Computer Science and Engineering
      Shanghai, Shanghai Shi, China
  • 2007–2009
    • Nanjing University of Aeronautics & Astronautics
      • Department of Computer Science and Technology
      Nan-ching, Jiangsu Sheng, China
    • University of Wisconsin, Madison
      • Department of Computer Sciences
      Madison, MS, United States
    • Sun Yat-Sen University
      Shengcheng, Guangdong, China
  • 2004–2009
    • Northeast Institute of Geography and Agroecology
      • • Institute of Computing Technology
      • • Institute of Software
      Beijing, Beijing Shi, China
  • 2006–2007
    • Peking University
      • School of Mathematical Sciences
      Peping, Beijing, China
  • 2005–2007
    • The Chinese University of Hong Kong
      • Department of Information Engineering
      Hong Kong, Hong Kong
  • 1996–2006
    • Simon Fraser University
      • School of Computing Science
      Burnaby, British Columbia, Canada
  • 1990–2000
    • University of Waterloo
      Waterloo, Ontario, Canada
  • 1999
    • University of Hawaiʻi at Mānoa
      • Department of Electrical Engineering
      Honolulu, HI, United States
  • 1998
    • The University of Western Ontario
      • Department of Computer Science
      London, Ontario, Canada
  • 1997
    • Carnegie Mellon University
      • Computer Science Department
      Pittsburgh, PA, United States
  • 1989
    • University of Maryland, College Park
      • Department of Computer Science
      Maryland, United States