Hao Wu

Yunnan University, Yün-nan, Yunnan, China

Are you Hao Wu?

Claim your profile

Publications (34)8.72 Total impact

  • [Show abstract] [Hide abstract]
    ABSTRACT: Collaborative tagging systems have been popular on the Web. However, information overload results in the increasing need for recommender services from users, and thus item recommendation has been one of the key issues in such systems. In this paper, we examine if data fusion can be helpful for improving effectiveness of item recommendation in these systems. For this, we first summarize the state-of-the-art recommendation methods which are classified into several categories according to their algorithmic principles. Then, we experiment with about 40 recommending components against the datasets from three social tagging systems-Delicious, Lastfm and CiteULike. Based on these, several heuristic data fusion models including rank-based and score-based are used to combine selected components. We also put forward a hybrid linear combination (HLC) model for fusing item recommendation. We use four kinds of evaluation metrics, which respectively consider accuracy, inner-diversity, inter-diversity and novelty, to systematically assess quality of recommendations obtained by various components or fusion models. Depending on experimental results, combining evidence from separate components can lead to performance improvement in the accuracy of recommendations, with a little or without loss of recommendation diversity and novelty, if separate components can suggest similar sets of relevant items but recommend different sets of non-relevant items. Particularly, fusing recommendation sets formed from different combinations of profile representations and similarity functions in user-based and item-based collaborative filtering can significantly improve recommendation accuracy. In addition, some other useful findings are also drawn: i)Using the tag to represent users profiles or items profiles maybe not as good as profiling users with the item or profiling items with the user, however, exploiting tags in the topic models and random walks can notably improve the accuracy, diversity and novelty of recommendations; ii)Generally, user-based collaborative filtering, item-based collaborative filtering and random walks methods are robust for the task of item recommendation in social tagging systems, thus can be chosen as the basic components of data fusion process;iii) The proposed method (HLC) is more flexible and robust than traditional data fusion models.
    Knowledge-Based Systems. 11/2014;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Advance reservation is an important method to guarantee the quality of service in Grid-like distributed systems. However, reserved jobs will make resource into fragments and decrease utilization. In order to minimize the negative effects of advance reservations, the authors analyzed the generation of resource fragments during reservation and investigated their influence on advance reservation requests in a quantitative way. Based on the quantification, two new scheduling algorithms, Resource Fragment-aware Best Fit (FSB) and Resource Fragment-aware Worst Fit (FSW), were proposed and their performances were investigated via comprehensive simulations. In simulation, mean job size, deadline factor, system load and sever number were chosen as control factors, and the performances of the algorithms were analyzed in terms of job acceptance rate, resource utilization and slowdown. We also compared FSB and FSW with Best Fit, First Fit, Min_LIP and Min_TIP. The simulations show that FSW and FSB can provide higher job acceptance rate, especially under heavy system load.
    2014 International Conference on Communication Systems and Network Technologies (CSNT); 04/2014
  • Hao Wu, Yu Hua, Bo Li, Yijian Pei
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper presents how to exploit rank aggregation approach to make personalized recommendation in social tagging systems. For this, some basic methods based on different principles and features, such as user-based collaborative filtering (CF), graph-based method and social-based CF are first introduced. Then, we specially adjust and optimize these methods to produce better results. Then, we exploit rank aggregation approaches to integrate these basic models to form hybrid recommenders. We experiment our methods on Lastfm dataset. And by solid experiments, our proposed hybrid models achieve optimal recommendation accuracy leveraged by the superiority of sub-models.
    2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD); 07/2013
  • Hao Wu, Yu Hua, Bo Li, Yijian Pei
    [Show abstract] [Hide abstract]
    ABSTRACT: Group recommender systems use various strategies to aggregate users' preferences into a common social welfare function which would maximize the satisfaction of all members. Group recommendation is essentially useful for websites, especially for social tagging systems. In this paper, we initially experiment with various rank aggregation strategies for group recommendation in social tagging systems. Specially, we consider trust-based user groups detected by community discovery based on trustable social relations. Also, we present hybrid similarity to estimate the relevance between users and resources. According to experiments on Delicious and Lastfm datasets, CombMAX, CombSUM and CombANZ are more suitable for aggregating individual preference into a group preference in social tagging systems. And group recommendation can achieve better effect than individual recommendation based on our proposed model.
    2013 10th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD); 07/2013
  • 2012 4th Electronic System-Integration Technology Conference (ESTC); 09/2012
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Advance reservation is important to guarantee the quality of services of jobs by allowing exclusive access to resources over a defined time interval on resources. It is a challenge for the scheduler to organize available resources efficiently and to allocate them for parallel AR jobs with deadline constraint appropriately. This paper provides a slot-based data structure to organize available resources of multiprocessor systems in a way that enables efficient search and update operations, and formulates a suite of scheduling policies to allocate resources for dynamically arriving AR requests. The performance of the scheduling algorithms were investigated by simulations with different job sizes and durations, system loads and scheduling flexibilities. Simulation results show that job sizes and durations, system load and the flexibility of scheduling will impact the performance metrics of all the scheduling algorithms, and the PE-Worst-Fit algorithm becomes the best algorithm for the scheduler with the highest acceptance rate of AR requests, and the jobs with the First-Fit algorithm experience the lowest average slowdown. The data structure and scheduling policies can be used to organize and allocate resources for parallel AR jobs with deadline constraint in large-scale computing systems.
    The Journal of Supercomputing 03/2012; · 0.92 Impact Factor
  • [Show abstract] [Hide abstract]
    ABSTRACT: In multiprocessor environment, resource reservation technology will split the continuous idle resources and generate resource fragments which would reduce resource utilization and job acceptance rate. In this paper, we defined resource fragments produced by resource reservation and proposed scheduling algorithms based on fragment-aware, the designs of which focus on improve acceptance ability of following-up jobs. Based on resource fragment-aware, we proposed two algorithms, Occupation Rate Best Fit and Occupation Rate Worst Fit, and in combination with heuristic algorithms, PE Worst Fit - Occupation Rate Best Fit and PE Worst Fit - Occupation Rate Worst Fit are put forward. We not only realized and analyzed algorithms in simulation, but also studied relationship between task properties and algorithms' performance. Experiments proved that PE Worst Fit - Occupation Worst Fit provides the best job acceptance rate and Occupation Rate Worst Fit has the best performance on average slowdown.
    Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), 2012 International Conference on; 01/2012
  • Hao Wu, Yu Hua, Bo Li, Yijian Pei
    [Show abstract] [Hide abstract]
    ABSTRACT: With the tremendous amount of citations available in digital library, how to suggest citations automatically, to meet the information needs of researchers has become an important problem. In this paper, we propose a model which treats citation recommendation as a special retrieval task to address this challenge. First, users provide a target paper with some metadata to our system. Second, the system retrieves a relevant candidate citation set. Then the candidate citations are reranked by well-chosen citation evidence, such as publication time preference, self-citation preference, co-citation preference and publication reputation preference. Especially, various measures are introduced to integrate the evidence. We experimented with the proposed model on an established bibliographic corpus-ACL Anthology Network, the results show that the model is valuable in practice, and citation recommendation can be significantly improved using proposed evidences.
    01/2012;
  • Hao Wu, Yijian Pei, Bo Li
    [Show abstract] [Hide abstract]
    ABSTRACT: Name ambiguity problem brings many challenges to scholar search. This problem has attracted many attentions in research communities, and various disambiguation algorithms combined with different citation features are proposed. However, there is still significant room for improvement. In this paper, we propose an unsupervised two-steps method to deal with the name disambiguation problems as an end user makes a scholar search. In the first step, the returned author's citations are blocked by using co-authorship relation, and then in second step, these blocks are merged by the classical hierarchical agglomerative clustering method. We test various linkage criteria and pairwise distances during hierarchical clustering, and find the best components to disambiguate citations. Also, we propose some approaches to improve the disambiguation performance in each step. According to experiments, our method outperforms 15% a best state-of-the-art work using the same recognized dataset without the need for any training.
    01/2012;
  • [Show abstract] [Hide abstract]
    ABSTRACT: Feature selection is a key step for image registration. The success of feature selection has a fundamental effect on matching image. Corners determine the contours characteristics of the target image, and the number of corners is far smaller than the number of image pixels, thus can be a good feature for image registration. By considering the algorithm speed and registration accuracy of the image registration, the paper proposes an improved Harris corner detection method for effective image registration. This method effectively avoids corner clustering phenomenon occurs during the corner detection process, thus the corner points detected distribute more reasonably, and the image registration become faster. The experiments also showed the effect of image registration is satisfactory, and reaches a reasonable match.
    Information Networking and Automation (ICINA), 2010 International Conference on; 11/2010
  • Conference Paper: none
    Bioinf; 07/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: Resource co-allocation is one of the crucial technologies affecting the utility and quality of services of large-scale distributed environments by simultaneously allocating multiple resources to one application. This paper concentrated on the problem to guarantee the QoS of co-allocation jobs via advance reservation and investigated the performances of two typical scheduling algorithms with and without advance reservation. Simulations have shown that advance reservation is effective to improve the QoS of co-allocation.
    Computer Modeling and Simulation, International Conference on. 01/2010; 3:24-27.
  • [Show abstract] [Hide abstract]
    ABSTRACT: Backfilling is well known in parallel job scheduling to increase system utilization and user satisfaction over traditional non-backfilling scheduling algorithms, which allow small jobs from the back of the queue to execute before larger jobs arriving earlier, and resources could be reserved to protect the latter from starvation. This paper proposed a relaxed backfill scheduling mechanism supporting multiple reservations, and investigated its effectiveness in reducing the average waiting time and average slowdown of jobs by using simulations with real traces. Different from existing relaxed scheduling, which restrict the maximum number of reservations to one, this new mechanism can support the relaxation of multiple reservations and works efficiently in scheduling by successful avoidance of raising chain reactions in relaxing the start times of multiple already existing reservations. Experimental results suggest that although the performances of both the relax-based backfilling and the strict backfill depend on the accuracy of runtime estimates, reservation depths, traces and system load alike, the former scheduling is more flexible and generally more effective in reducing the average waiting time and average slowdown of jobs, without loss of utilization.
    2010 International Conference on Parallel and Distributed Computing, Applications and Technologies, PDCAT 2010, Wuhan, China, 8-11 December, 2010; 01/2010
  • [Show abstract] [Hide abstract]
    ABSTRACT: As a new task of expertise retrieval, finding research communities for scientific guidance and research cooperation has become more and more important. However, the existing community discovery algorithms only consider graph structure, without considering the context, such as knowledge characteristics. Therefore, detecting research community cannot be simply addressed by direct application of existing methods. In this paper, we propose a hierarchical discovery strategy which rapidly locates the core of the research community, and then incrementally extends the community. Especially, as expanding local community, it selects a node considering both its connection strength and expertise divergence to the candidate community, to prevent intellectually irrelevant nodes to spill-in to the current community. The experiments on ACL Anthology Network show our method is effective.
    Advanced Intelligent Computing Theories and Applications, 6th International Conference on Intelligent Computing, ICIC 2010, Changsha, China, August 18-21, 2010. Proceedings; 01/2010
  • Hao Wu, Jun He, Yijian Pei
    [Show abstract] [Hide abstract]
    ABSTRACT: In this article, we propose to apply the topic model and topic-level eigenfactor (TEF) algorithm to assess the relative importance of academic entities including articles, authors, journals, and conferences. Scientific impact is measured by the biased PageRank score toward topics created by the latent topic model. The TEF metric considers the impact of an academic entity in multiple granular views as well as in a global view. Experiments on a computational linguistics corpus show that the method is a useful and promising measure to assess scientific impact. © 2010 Wiley Periodicals, Inc.
    Journal of the American Society for Information Science and Technology 01/2010; 61:2274-2287. · 2.01 Impact Factor
  • Hao Wu, Yijian Pei, Jiang Yu
    [Show abstract] [Hide abstract]
    ABSTRACT: As a retrieval task, expert finding has recently attracted much attention. And various methods have been proposed to rank expert candidates against topical query. The most efficient approach is document-based method that treats supporting documents as a ldquobridgerdquo and ranks the candidates based on the co-occurrences of topic and candidate mentions in the supporting documents.However, such kind of methods models relevance between query and candidates on the much lower and hence less ambiguous level. It lacks of the capability to capture the hidden semantic association between queries and candidates. In this paper, we propose a hidden topic analysis based approach to estimate the relevance between query and candidates. It models query and supporting document as a word-topic-document association instead of the word-document association in language model. In addition, the prior knowledge of supporting document is considered to favor expert ranking. The empirical results on metadata corpus have demonstrated the model can effectively catch the semantic association between queries and candidates, thus improves the performance of expert finding.
    8th IEEE/ACIS International Conference on Computer and Information Science, IEEE/ACIS ICIS 2009, June 1-3, 2009, Shanghai, China; 01/2009
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a simulated joint robot-hand system based on Java3D, which can be used as a Java application in an independent computer, or as a Java Applet in a network environment. When the system is executed, the authorized user can control the simulated robot hand with flexible input device like joystick, and the status of the robot hand can be displayed in 3D-simulation mode dynamically. After the operation, the operation data was recorded into database automatically, and a log is built, so that the past operation can be checked or analyzed.
    01/2009;
  • Hao Wu, Yijian Pei
    [Show abstract] [Hide abstract]
    ABSTRACT: The studies of citations are comprehensively carried out with the increasing electronically citation data on the Web. Most of the metrics observe scientific quality in a global view instead of in multiple fine-grained views. In this paper, we suggest to apply Topic Model and adaptive PageRank algorithm to assess the relative importance of scientific objects including articles, authors, conferences and journals. The scientific quality is measured by an aggregation PageRank metric towards some topics. This metric considers the impact of a paper both in global view and local view. The experiments on ACL Anthology bibliographic corpus show our method is a useful measure to observe scientific quality on multi-views.
    Proceedings of the 2nd International Conference on BioMedical Engineering and Informatics, BMEI 2009, October 17-19, 2009, Tianjin, China; 01/2009
  • Hao Wu, Yijian Pei, Jiang Yu
    [Show abstract] [Hide abstract]
    ABSTRACT: The problem of academic expert finding is concerned with finding the experts on a named research field. It has many real-world applications and has recently attracted much attention. However, the existing methods are not versatile and suitable for the special needs from academic areas where the co-authorship and the citation relation play important roles in judging researchers’ achievements. In this paper, we propose and develop a flexible data schema and a topic-sensitive co-pagerank algorithmcombined with a topic model for solving this problem. The main idea is to measure the authors’ authorities by considering topic bias based on their social networks and citation networks, and then, recommending expert candidates for the questions. To infer the association between authors and topics, we draw a probability model from the latent Dirichlet allocation (LDA) model. We further propose several techniques such as reasoning the interested topics of the query and integrating ranking metrics to order the practices. Our experiments show that the proposed strategies are all effective to improve the retrieval accuracy.
    Frontiers of Computer Science in China 01/2009; 3:445-456. · 0.27 Impact Factor
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Intelligent retrieval for best satisfying users search intensions still remains a challenging problem due to the inherent complexity of real-world semantic web applications. Usually, a search request contains not only vagueness or imprecision, but also personalized information goals. This paper presents a novel approach which formulates one’s search request through tightly combining fuzziness together with the user’s subjective weighting importance over multiple search properties. A special ranking mechanism based on the weighed fuzzy query representation is proposed. The ranking method generates a set of “degree of relevance” – an overall score which reflects both fuzzy predicates and the user’s personalized preferences in the search request. Moreover, the ranking method is general and unique rather than arbitrary. Hence, search results shall be properly ordered in terms of their relevance with respect to best matching the search intension. The experimental results show that our approach can effectively capture users information goals and produce much better search results than existing approaches.
    Knowledge-Based Systems 10/2008; · 4.10 Impact Factor