Sara Javanmardi

University of California, Irvine, Irvine, California, United States

Are you Sara Javanmardi?

Claim your profile

Publications (15)0 Total impact

  • [show abstract] [hide abstract]
    ABSTRACT: Obtaining the best accuracy in machine learning usually requires carefully tuning learning algorithm parameters for each problem. Parameter optimization is computationally challenging for learning methods with many hyperparameters. In this paper we show that MapReduce Clusters are particularly well suited for parallel parameter optimization. We use MapReduce to optimize regularization parameters for boosted trees and random forests on several text problems: three retrieval ranking problems and a Wikipedia vandalism problem. We show how model accuracy improves as a function of the percent of parameter space explored, that accuracy can be hurt by exploring parameter space too aggressively, and that there can be significant interaction between parameters that appear to be independent. Our results suggest that MapReduce is a two-edged sword: it makes parameter optimization feasible on a massive scale that would have been unimaginable just a few years ago, but also creates a new opportunity for overfitting that can reduce accuracy and lead to inferior learning parameters.
    01/2011;
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: User generated content (UGC) constitutes a significant fraction of the Web. However, some wiiki-based sites, such as Wikipedia, are so popular that they have become a favorite target of spammers and other vandals. In such popular sites, human vigilance is not enough to combat vandalism, and tools that detect possible vandalism and poor-quality contributions become a necessity. The application of machine learning techniques holds promise for developing efficient online algorithms for better tools to assist users in vandalism detection. We describe an efficient and accurate classifier that performs vandalism detection in UGC sites. We show the results of our classifier in the PAN Wikipedia dataset. We explore the effectiveness of a combination of 66 individual features that produce an AUC of 0.9553 on a test dataset -- the best result to our knowledge. Using Lasso optimization we then reduce our feature--rich model to a much smaller and more efficient model of 28 features that performs almost as well -- the drop in AUC being only 0.005. We describe how this approach can be generalized to other user generated content systems and describe several applications of this classifier to help users identify potential vandalism.
    Proceedings of the 7th International Symposium on Wikis and Open Collaboration, 2011, Mountain View, CA, USA, October 3-5, 2011; 01/2011
  • Source
    David W. McDonald, Sara Javanmardi, Mark Zachry
    [show abstract] [hide abstract]
    ABSTRACT: Our everyday observations about the behaviors of others around us shape how we decide to act or interact. In social media the ability to observe and interpret others' behavior is limited. This work describes one approach to leverage everyday behavioral observations to develop tools that could improve understanding and sense making capabilities of contributors, managers and researchers of social media systems. One example of behavioral observation is Wikipedia Barnstars. Barnstars are a type of award recognizing the activities of Wikipedia editors. We mine the entire English Wikipedia to extract barnstar observations. We develop a multi-label classifier based on a random forest technique to recognize and label distinct forms of observed and acknowledged activity. We evaluate the classifier through several means including use of separate training and testing datasets and the by application of the classifier to previously unlabeled data. We use the classifier to identify Wikipedia editors who have been observed with some predominant types of behavior and explore whether those patterns of behavior are evident and how observers seem to be making the observations. We discuss how these types of activity observations can be used to develop tools and potentially improve understanding and analysis in wikis and other online communities.
    Proceedings of the 7th International Symposium on Wikis and Open Collaboration, 2011, Mountain View, CA, USA, October 3-5, 2011; 01/2011
  • Source
    Analyzing Microtext, Papers from the 2011 AAAI Workshop, San Francisco, California, USA, August 8, 2011; 01/2011
  • Source
    Sara Javanmardi, Jianfeng Gao, Kuansan Wang
    [show abstract] [hide abstract]
    ABSTRACT: Although higher order language models (LMs) have shown benefit of capturing word dependencies for Information retrieval(IR), the tuning of the increased number of free parameters remains a formidable engineering challenge. Consequently,in many real world retrieval systems, applying higher order LMs is an exception rather than the rule. In this study, we address the parameter tuning problem using a framework based on a linear ranking model in which different component models are incorporated as features. Using unigram and bigram LMs with 2 stage smoothing as examples, we show that our method leads to a bigram LM that outperforms significantly its unigram counterpart and the well-tuned BM25 model.
    Proceedings of the 19th International Conference on World Wide Web, WWW 2010, Raleigh, North Carolina, USA, April 26-30, 2010; 01/2010
  • Source
    Statistical Analysis and Data Mining. 01/2010; 3:126-139.
  • S. Javanmardi, Y. Ganjisaffar, C. Lopes, P. Baldi
    [show abstract] [hide abstract]
    ABSTRACT: Wikipedia, one of the top ten most visited websites, is commonly viewed as the largest online reference for encyclopedic knowledge. Because of its open editing model -allowing anyone to enter and edit content- Wikipedia's overall quality has often been questioned as a source of reliable information. Lack of study of the open editing model of Wikipedia and its effectiveness has resulted in a new generation of wikis that restrict contributions to registered users only, using their real names. In this paper, we present an empirical study of user contributions to Wikipedia. We statistically analyze contributions by both anonymous and registered users. The results show that submissions of anonymous and registered users in Wikipedia suggest a power law behavior. About 80% of the revisions are submitted by less than 7% of the users, most of whom are registered users. To further refine the analyzes, we use the Wiki Trust Model (WTM), a user reputation model developed in our previous work to assign a reputation value to each user. As expected, the results show that registered users contribute higher quality content and therefore are assigned higher reputation values. However, a significant number of anonymous users also contribute high-quality content.We provide further evidence that regardless of a user s' attribution, registered or anonymous, high reputation users are the dominant contributors that actively edit Wikipedia articles in order to remove vandalism or poor quality content.
    Collaborative Computing: Networking, Applications and Worksharing, 2009. CollaborateCom 2009. 5th International Conference on; 12/2009
  • Y. Ganjisaffar, S. Javanmardi, C. Lopes
    [show abstract] [hide abstract]
    ABSTRACT: Wikipedia, the largest encyclopedia on the Web, is often seen as the most successful example of crowdsourcing. The encyclopedic knowledge it accumulated over the years is so large that one often uses search engines, to find information in it. In contrast to regular Web pages, Wikipedia is fairly structured, and articles are usually accompanied with history pages, categories and talk pages. The meta-data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We discuss how the rich meta-data available in wiki pages can be used to provide better search results in Wikipedia. Built on the studies on "Wisdom of Crowds" and the effectiveness of the knowledge collected by a large number of people, we investigate the effect of incorporating the extent of review of an article in the quality of rankings of the search results. The extent of review is measured by the number of distinct editors contributed to the articles and is extracted by processing Wikipedia's history pages. We compare different ranking algorithms that explore combinations of text-relevancy, PageRank, and extent of review. The results show that the review-based ranking algorithm which combines the extent of review and text-relevancy outperforms the rest; it is more accurate and less computationally expensive compared to PageRank-based rankings.
    Computational Aspects of Social Networks, 2009. CASON '09. International Conference on; 07/2009
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Wikipedia articles are usually accompanied with history pages, categories and talk pages. The meta{data available in these pages can be analyzed to gain a better understanding of the content and quality of the articles. We analyze the quality of search results of the current major Web search engines (Google, Yahoo! and Live) in Wikipedia. We discuss how the rich meta{data available in wiki pages can be used to provide better search results in Wikipedia. We investigate the eect of incorporating the extent of review of an ar- ticle into ranking of search results. The extent of review is measured by the number of distinct editors who have contributed to the articles and is extracted by processing Wikipedia's history pages. Our experimental results show that re{ranking search results of the three major Web search engines, using the review feature, improves quality of their rankings for Wikipedia{specic searches. users. We propose a review{based rank- ing algorithm to improve quality of search in the domain of Wikipedia. We show that the quality of the rankings by the current major Web search engines can be improved by in- corporating the proposed heuristic in their ranking schemes. The contributions of this work are twofold. First, the empirical study of search performance by the three major search engines in Wikipedia provides valuable evidence that not all search engines are equal. Second, the review{based heuristic proposed here results in considerable improvements for the two least{performing search engines.
    Proceedings of the 2009 International Symposium on Wikis, 2009, Orlando, Florida, USA, October 25-27, 2009; 01/2009
  • [show abstract] [hide abstract]
    ABSTRACT: Organizations increasingly create massive internal digital data repositories and are looking for technical advances in managing, exchanging and integrating explicit knowledge. While most of the enabling technologies for knowledge management have been used around for several years, the ability to cost effective data sharing, integration and analysis into a cohesive infrastructure evaded organizations until the advent of Web 2.0 applications. In this paper, we discuss our investigations into using a Wiki as a web–based interactive knowledge management system, which is integrated with some features for easy data access, data integration and analysis. Using the enhanced wiki, it possible to make organizational knowledge sustainable, expandable, outreaching and continually up–to–date. The wiki is currently under use as California Sustainable Watershed Information Manager. We evaluate our work according to the requirements of knowledge management systems. The result shows that our solution satisfies more requirements compared to other tools.
    Collaborative Computing: Networking, Applications and Worksharing, 4th International Conference, CollaborateCom 2008, Orlando, FL, USA, November 13-16, 2008, Revised Selected Papers; 01/2008
  • Source
    S. Javanmardi, C.V. Lopes
    [show abstract] [hide abstract]
    ABSTRACT: Collaborative systems available on the Web allow millions of users to share information through a growing collection of tools and platforms such as wikis, blogs and shared forums. All of these systems contain information and resources with different degrees of sensitivity. However, the open nature of such infrastructures makes it difficult for users to determine the reliability of the available information and trustworthiness of information providers. Hence, integrating trust management systems to open collaborative systems can play a crucial role in the growth and popularity of open information repositories. In this paper, we present a trust model for collaborative systems, namely for platforms based on Wiki technology. This model, based on hidden Markov models, estimates the reputation of the contributors and the reliability of the content dynamically. The focus of this paper is on reputation estimation. Evaluation results based on a subset of Wikipedia shows that the model can effectively be used for identifying vandals, and users with high quality contributions.
    Collaborative Computing: Networking, Applications and Worksharing, 2007. CollaborateCom 2007. International Conference on; 12/2007
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: Wikipedia is commonly viewed as the main online encyclo- pedia. Its content quality, however, has often been ques- tioned due to the open nature of its editing model. A high- quality contribution by an expert may be followed by a low- quality contribution made by an amateur or vandal; therefore the quality of each article may fluctuate over time as it goes through iterations of edits by different users. In this study, we model the evolution of content quality in Wikipedia articles in order to estimate the fraction of time during which arti- cles retain high-quality status. The results show that articles tend to have high-quality content 74% of their lifetime and the average article quality increases as articles go through ed- its. To further analyze the open editing model of Wikipedia, we compare the behaviour of anonymous and registered users and show that there is a positive correlation between regis- tration and quality of the contributed content. In addition, we compare the evolution of the content in Wikipedia known high-quality articles (aka. featured articles) and the rest of the articles in order to extract features affecting quality. The results show that the high turnover of the content caused by the open editing model of Wikipedia results in rapid elimi- nation of low-quality content.These results not only suggest that the process underlying Wikipedia can be used for pro- ducing high-quality content, but also to question the viabil- ity of collaborative knowledge repositories that impose high barriers to user participation for the purpose of filtering poor quality contributions from the onset. overall quality in a definitive way, two studies have tried to assess it manually by comparison of Wikipedia articles to their parallel articles in other reputable sources (Giles 2005; Chesney 2006). Nature magazine's comparative analysis of forty-two science articles in both Wikipedia and the Ency- clopedia Britannica showed a surprisingly small difference; Britannica disputed this finding, saying that the errors in Wikipedia were more serious than the Britannica errors and that the source documents for the study included the junior versions of the encyclopedia as well as the Britannica year books 1 . The questions surrounding Wikipedia's open editing model have triggered a new generation of wikis like Citi- zendium 2 and Scholarpedia 3 . These online encyclopedias follow a much more traditional editing model, where a small number of experts produce most of the content, through a peer-reviewing process 4 . However, there is very little ev- idence that these traditional editing models are better than Wikipedia's model for the purpose of creating encyclopedic knowledge. To further address these issues, one must de- velop methods for automatically assessing Wikipedia's qual- ity and the parameters that affect it. Since Wikipedia is a highly dynamic system, the articles are changing very frequently. Therefore, the quality of ar- ticles is a time-dependent function and a single article may contain high- and low-quality content in different spans of its lifetime. The goal of our study is to analyze the evolution of content in Wikipedia articles over time and estimate the fraction of time that articles are in high-quality state. This paper offers two main contributions to the state of the art. First, we develop an automated measure to esti- mate quality of article revisions throughout the entire En- glish Wikipedia. Using this measure, we follow the evolu- tion of content quality and show that the fraction of time that articles are in a high-quality state has an increasing trend over time. Then, we present an empirical study of Wikipedia statistics that may explain the results obtained in our study. We analyze the contributions of registered and anonymous users and show that there is a positive correla-
  • Source
    [show abstract] [hide abstract]
    ABSTRACT: The concept of scientific mashups is gaining popularity as the sheer amount of scientific content is scattered over dif-ferent sources, such as databases or public websites. A vari-ety of mashup development frameworks exist, but none fully address the needs of the scientific community. One limita-tion of scientific mashups is the issue of trust and attribute; especially when the content comes from collaborative in-formation repositories where the quality of such content is unknown. In this paper, for our case study we focus on CalSWIM whose content is taken from both highly reliable sources and Wikipedia which may be less so. We will show how integrating CalSWIM with a reputation management system can help us assess the reputation of users and the trustworthiness of the content. Using user reputations, the system selects the most recent and trustworthy revision of the wiki article rather than merely the most recent revision, which might be vandalistic or of poor quality.
  • Source
    S Javanmardi, M Amini, R Jalili, Y Ganjisaffar
    [show abstract] [hide abstract]
    ABSTRACT: Semantic Web is the vision for future of current Web which aims at automation, integration and reuse of data among different Web applications. The shift to Semantic Web applications poses new require-ments for security mechanisms especially in the access control models as a critical component of security systems. Access to resources can not be controlled in a safe way unless the access decision takes into account the semantic relationships among entities in the data model under the Semantic Web. Decision making for granting or revoking access requests by assuming entities in isolation and not considering their interrelations may result in security violations. In this paper, we present a Semantic Based Access Control model (SBAC) which considers semantic relations among different entities in the decision making process. For accurate de-cision making, SBAC considers semantic relations among entities in all domains of access control, namely the subject domain, the object domain and the action domain. To facilitate the propagation of policies in these three domains, we show how different semantic interrelations can be re-duced to the subsumption problem. This reduction enhances the space and time complexity of the access control mechanisms which are based on SBAC.
  • Source
    Sara Javanmardi, Morteza Amini, Rasool Jalili
    [show abstract] [hide abstract]
    ABSTRACT: Semantic Web is a vision for future of the current Web which aims at automation, integration and reuse of data among different Web applications. Access to resources on the Semantic Web can not be con-trolled in a safe way unless the access decision takes into account the semantic relationships among entities in the data model under this en-vironment. Decision making for permitting or denying access requests by assuming entities in isolation and not considering their interrelations may result in security violations. In this paper, we present a Semantic Based Access Control model (SBAC) which considers this issue in the decision making process. To facilitate the propagation of policies in these three domains, we show how different semantic interrelations can be re-duced to the subsumption problem. This reduction enhances the space and time complexity of the access control mechanisms which are based on SBAC. Our evaluations of the SBAC model along with experimental results on a sample implementation of the access control system show that the proposed model is very promising.