About
15
Publications
2,018
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
922
Citations
Introduction
Current institution
Publications
Publications (15)
In this paper, we present an overview of the Myria stack for big data management and analytics that we developed in the database group at the University of Washington and that we have been operating as a cloud service aimed at domain scientists around the UW campus. We highlight Myria's key design choices and innovations and report on our experienc...
Visualizing NLP annotation is useful for the collection of training data for
the statistical NLP approaches. Existing toolkits either provide limited visual
aid, or introduce comprehensive operators to realize sophisticated linguistic
rules. Workers must be well trained to use them. Their audience thus can hardly
be scaled to large amounts of non-e...
In this demonstration, we will showcase Myria, our novel cloud service for big data management and analytics designed to improve productivity. Myria's goal is for users to simply upload their data and for the system to help them be self-sufficient data science experts on their data -- self-serve analytics. Using a web browser, Myria users can uploa...
This paper is concerned with the problem of mining social emotions from text. Recently, with the fast development of Web 2.0, more and more documents are assigned by social users with emotion labels such as happiness, sadness, and surprise. Such emotions can provide a new aspect for document categorization, and therefore help online users select re...
This paper presents StreamXPlore, a system that enables users to explore historical stream data in order to de-termine what events to monitor in the future. At the heart of StreamXPlore is a new event modeling mecha-nism. StreamXPlore enables the specification, analysis, and mining of these new types of events. Event analysis en-ables event refinem...
Community discovery on large-scale linked document corpora has been a hot research topic for decades. There are two types of links. The first one, which we call d2d-link, indicates connectiveness among different documents, such as blog references and research paper citations. The other one, which we call u2u-link, represents co-occurrences or simul...
This paper is concerned with community discovery in textual interaction graph, where the links between entities are indicated by textual documents. Specifically, we propose a Topical Link Model(TLM), which leverages Hierarchical Dirichlet Process(HDP) to introduce hidden topical variable of the links. Other than the use of links, TLM can look into...
This paper is concerned with the problem of social affective text mining, which aims to discover the connections between social emotions and affective terms based on user-generated emotion labels. We propose a joint emotion-topic model by augmenting latent Dirichlet allocation with an additional layer for emotion modeling. It first generates a set...
This paper is concerned with the problem of boosting social annotations using propagation, which is also called social propagation. In particular, we focus on propagating social annotations of web pages (e.g., annotations in Del.icio.us). Social annotations
are novel resources and valuable in many web applications, including web search and browsing...
This paper is concerned with the study of information retrieval (IR) on Accumulative Social Descriptions (ASDs). ASDs refer to Web texts that accumulated by many Web users describing certain Web resources, such as anchor texts, search logs and social annotations. There have been some studies working on leveraging ASDs for improving search performan...
The rapidly increasing popularity of community-based Question Answering (cQA) services, e.g. Yahoo! Answers, Baidu Zhidao, etc. have attracted great attention from both academia and industry. Besides the basic problems, like question searching and answer finding, it should be noted that the low participation rate of users in cQA service is the cruc...
This paper is concerned with the problem of boosting social annotations using propagation, which is also called social propagation. In particular, we focus on propagating social annotations of web pages (e.g., annotations in Del.icio.us). Although social annotations are developing fast, they cover only a small proportion of Web pages on the World W...
As a social service in Web 2.0, folksonomy provides the users the ability to save and organize their bookmarks online with "social annotations" or "tags". Social annotations are high quality descriptors of the web pages' topics as well as good indicators of web users' interests. We propose a personalized search framework to utilize folksonomy for p...
This poster is concerned with the problem of exploring the use of social annotations for improving language models for information retrieval (denoted as LMIR). Two properties of social annotations, namely keyword property and structure property are studied for this aim. The keyword property improves LMIR by concatenating all the annotations of a do...
In the paper, we present an exploration of using social annotations provided by the Web 2.0 sites (such as Del.icio.us) in
helping web search. More specifically, we consider using the social annotations as an additional resource to strengthen existing
smoothing methods for the language model for IR. The social annotations can benefit the smoothing...