Shimei Pan

Shimei Pan
University of Maryland, Baltimore County | UMBC · Department of Information Systems

About

65
Publications
11,013
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,643
Citations

Publications

Publications (65)
Article
Statistical topic models have become a useful and ubiquitous tool for analyzing large text corpora. One common application of statistical topic models is to support topic-centric navigation and exploration of document collections. Existing work on topic modeling focuses on the inference of model parameters so the resulting model fits the input data...
Preprint
Although there has been a great deal of interest in analyzing customer opinions and breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to discover and retrieve data of interest from microblogs. To address this problem, we have developed an uncertainty-aware visual analytics approach to retrieve salient pos...
Article
Although there has been a great deal of interest in analyzing customer opinions and breaking news in microblogs, progress has been hampered by the lack of an effective mechanism to discover and retrieve data of interest from microblogs. To address this problem, we have developed an uncertainty-aware visual analytics approach to retrieve salient pos...
Conference Paper
Full-text available
In this paper, we present a comprehensive study of the relationship between an indi-vidual's personal traits and his/her brand preferences. In our analysis, we included a large number of character traits such as personality, personal values and individual needs. These trait features were obtained from both a psychometric survey and automated social...
Article
The prevalence of social content sharing such as video and photo sharing has greatly enhanced information discovery and social interaction over the internet. This has inspired similar efforts within enterprise to encourage collaboration and expertise sharing. Moreover, enterprise web meeting tools increasingly become an important platform for knowl...
Conference Paper
Full-text available
We present LDAExplore, a tool to visualize topic distributions in a given document corpus that are generated using Topic Modeling methods. Latent Dirichlet Allocation (LDA) is one of the basic methods that is predominantly used to generate topics. One of the problems with methods like LDA is that users who apply them may not understand the topics t...
Article
To encourage enterprise knowledge sharing especially, to facilitate the discovery and sharing of enterprise meetings, we develop an end-to-end enterprise meeting service called Agora that manages the full cycle of hosting web meetings and sharing multimedia recorded meeting artifacts. In this paper, we focus on Agora's composite search engine that...
Patent
Full-text available
Automatically generating a meeting digest of a set of meetings is provided. A set of topics of interest to the parties to the set of meetings is detected utilizing a user model associated with a user that is based on at least one of communications, relationships, and roles of the parties to the set of meetings. Topic-related content associated with...
Conference Paper
Expertise and skill assessments are a common aspect of working in an enterprise, but manual assessments are onerous and quickly outdated. Automated assessments can alleviate these problems, albeit at the risk of being inaccurate. In this short paper, we focus on the problem of how to design an engaging learning system in the presence of potentially...
Patent
Full-text available
One embodiment of the present method and apparatus for robust input interpretation by conversation systems includes receiving a user request containing at least one un-interpretable term. The present invention conveys the conversation system's interpretation capabilities to the user, for example by suggesting at least one alternative request in con...
Conference Paper
In this paper, we present Expediting Expertise, a system designed to provide structured support to the otherwise informal process of social learning in the enterprise. It employs a data-driven approach where online content is automatically analyzed and categorized into relevant topics, topic-specific user expertise is calculated by comparing the mo...
Article
In this paper, we propose a novel constrained coclustering method to achieve two goals. First, we combine information-theoretic coclustering and constrained clustering to improve clustering performance. Second, we adopt both supervised and unsupervised constraints to demonstrate the effectiveness of our algorithm. The unsupervised constraints are a...
Conference Paper
We are building a topic-based, interactive visual analytic tool that aids users in analyzing large collections of text. To help users quickly discover content evolution and significant content transitions within a topic over time, here we present a novel, constraint-based approach to temporal topic segmentation. Our solution splits a discovered top...
Article
Sentiment plays a critical role in many information-centric business scenarios. The opinion mining methods proposed in the recent decade have formed a solid foundation to investigate the sentiment analysis tasks, but are often too complicated and scattered to serve the needs of real customers. We introduce the VISA system in this paper, which appli...
Conference Paper
In this Note we summarize our research on increasing the information scent of video recordings that are shared via email in a corporate setting. We compare two types of email messages for sharing recordings: the first containing basic information (e.g. title, speaker, abstract) with a link to the video; the second with the same information plus a s...
Article
We present Enterprise Priority Inbox Classifier (EPIC), an automatic personalized email prioritization system based on a topic-based user model built from the user's email data and relevant enterprise information. The user model encodes the user's topics of interest and email processing behaviors (e.g. read/reply/file) at the granularity of pair-wi...
Conference Paper
Full-text available
Corporate meetings are increasingly being held remotely using web technologies. With such remote meetings being recorded and made available after the fact, there is a pressing need for tools to access and utilize these recordings efficiently. Our work explores the utility of using annotations generated by meeting attendees to meet this need. We con...
Article
We are building an interactive visual text analysis tool that aids users in analyzing large collections of text. Unlike existing work in visual text analytics, which focuses either on developing sophisticated text analytic techniques or inventing novel text visualization metaphors, ours tightly integrates state-of-the-art text analytics with intera...
Conference Paper
Full-text available
In this paper, we discuss the use of analytic trails to support the needs of business users when conducting visual data analysis, focusing particularly on the aspects of analytic provenance, asynchronous collaboration, and reuse of analyses. We present a prototype implementation of analytic trail technology as part of Smarter Decisions ( a web-base...
Article
With the development of speech recognition and synthesis technology, speech interfaces for practical applications are in high demand. For applications like spoken dialogues systems, where not only the waveform but also the content of a system’s query/response have to be generated automatically, a Concept-to-Speech system is needed. One key module i...
Conference Paper
We present three studies that compare adoption and appropriation between China and the United States of BlogCentral, an internal blogging tool employed by a large global enterprise. We first analyzed 23 months of usage logs for users in both countries and found that compared to the U.S., Chinese users were much less active, with less activity and s...
Conference Paper
We present ICARUS, a contextual information retrieval system, which uses the current email message and a multi-tiered user model to retrieve relevant content and make it available in a sidebar widget embedded in the email client. The system employs a dynamic retrieval strategy to conduct automated contextual search across multiple information sourc...
Conference Paper
In this paper, we present a novel exploratory visual analytic system called TIARA (Text Insight via Automated Responsive Analytics), which combines text analytics and interactive visualization to help users explore and analyze large collections of text. Given a collection of documents, TIARA first uses topic analysis techniques to summarize the doc...
Article
Over the past decades, there have been significant efforts on developing robust and easy-to-use query interfaces to databases. So far, the typical query interfaces are GUI-based visual query interfaces. Visual query interfaces however, have limitations especially when they are used for accessing large and complex datasets. Therefore, we are develop...
Article
In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximi...
Conference Paper
Over the past decades, there have been significant efforts on developing robust and easy-to-use query interfaces to databases. So far, the typical query interfaces are GUI-based visual query interfaces. Visual query interfaces however, have limitations especially when they are used for accessing large and complex datasets. Therefore, we are develop...
Conference Paper
Full-text available
In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and con-strained clustering. We use a two-sided hidden Markov ran-dom field (HMRF) to model both the document and word constraints. We also develop an alternating expectation max-...
Conference Paper
Full-text available
Topic-based text summaries promise to help average users quickly understand a text collection and derive insights. Recent research has shown that the Latent Dirichlet Allocation (LDA) model is one of the most effective approaches to topic analysis. However, the LDA-based results may not be ideal for human understanding and consumption. In this pape...
Conference Paper
We are building an interactive, visual text analysis tool that aids users in analyzing a large collection of text. Unlike existing work in text analysis, which focuses either on developing sophisticated text analytic techniques or inventing novel visualization metaphors, ours is tightly integrating state-of-the-art text analytics with interactive v...
Patent
Methods and apparatus are provided for integrating a visual query interface and a natural language interface. The disclosed interface embeds a first of the visual query interface and the natural language interface into a second of the visual query interface and the natural language interface. The disclosed interface can also receives one or more na...
Conference Paper
Today's search engines return a wide range of information from diverse sources with lightening speed. The information that is returned, however, is independent of who asked the question or the context in which the information need arose. Next generation ...
Conference Paper
In this paper, we address a critical problem in con- versation systems: limited input interpretation ca- pabilities. When an interpretation error occurs, us- ers often get stuck and cannot recover due to a lack of guidance from the system. To solve this prob- lem, we present a hybrid natural language query recommendation framework that combines nat...
Conference Paper
Information seeking is an important but often difficult task, especially when it involves large and complex data sets. We hypothesize that a context-sensitive interaction paradigm would greatly assist users in their information seeking. Such a paradigm would allow users to both express their requests and receive requested information in context. Dr...
Conference Paper
Information seeking is an important but often difficult task especially when involving large and complex data sets. We hypothesize that a context-sensitive interaction paradigm can greatly assist users in their information seeking. Such a paradigm allows a system to both understand user data requests and present the requested information in context...
Conference Paper
Multimodal conversation systems allow users to interact with computers effectively using multiple modalities, such as natural language and gesture. However, these systems have not been widely used in practical applications mainly due to their limited input understanding capability. As a result, conversation systems often fail to understand user req...
Conference Paper
This paper describes a novel instance- based sentence boundary determination method for natural language generation that optimizes a set of criteria based on examples in a corpus. Compared to exist- ing sentence boundary determination ap- proaches, our work offers three signifi- cant contributions. First, our approach provides a general domain inde...
Chapter
In a multimodal human-machine conversation, user inputs are often abbreviated or imprecise. Simply fusing multimodal inputs together may not be sufficient to derive a complete understanding of the inputs. Aiming to handle a wide variety of multimodal inputs, we are building a context-based multimodal interpretation framework called MIND (Multimodal...
Chapter
This paper presents Segue, a hybrid surface natural language generator that employs case-based paradigm but performs rule-based adaptations. It uses an annotated corpus as its knowledge source and employs grammatical rules to construct new sentences. By using adaptation-guided retrieval to select cases that can be adapted easily to the desired outp...
Conference Paper
Despite increasing deployment of agent technologies in several business and industry domains, user confidence in fully automated agent driven applications is noticeably lacking. The main reasons for such lack of trust in complete automation are scalability ...
Article
Full-text available
Pitch accent placement is a major topic in intonational phonology research and its application to speech synthesis. What factors influence whether or not a word is made intonationally prominent or not is an open question. In this paper, we investigate how one aspect of a word's local context - its collocation with neighboring words - influences whe...
Article
Under consideration for other conferences (specify)? INLG02 In spoken language applications such as conversation systems where not only the speech waveforms but also the content of the speech (the text) need to be generated automatically, a Concept-to-Speech (CTS) system is needed. In this paper, we address several issues on designing a speech corp...
Article
Prosody modeling is critical in developing a Concept-to-Speech (CTS) system where both Natural Language Generation (NLG) and Speech Synthesis are used to automatically generate natural, coherent speech. In this paper, we empirically verify the usefulness of various natural language features in prosody modeling. Three groups of features are investig...
Article
Full-text available
Spoken dialogue system performance can vary widely for different users, as well for the same user during different dialogues.This paper presents the design and evaluation of an adaptive version of TOOT, a spoken dialogue system for retrieving online train schedules. Based on rules learned from a set of training dialogues, adaptive TOOT constructs a...
Article
In this paper, we report on an effort to provide a general-purpose spoken language generation tool for Concept-to-Speech (CTS) applications by extending a widely used text generation package, FUF/SURGE, with an intonation generation component. As a first step, we applied machine learning and statistical models to learn intonation rules based on the...
Article
Full-text available
While the notion of a cooperative response has been the focus of considerable research in natural lan- guage dialogue systems, there has been little empirical work demonstrating how such responses lead to more efficient, natural, or successful dialogues. This paper presents an experimental evaluation of two alternative response strategies in TOOT,...
Conference Paper
In a multimodal human-machine conversation, user inputs are often abbreviated or imprecise. Sometimes, merely fusing multimodal inputs together cannot derive a complete understanding. To address these inadequacies, we are building a semantics-based multimodal interpretation framework called MIND (Multimodal Interpretation for Natural Dialog). The u...
Conference Paper
We are building a full-fledged multimedia conversation framework called Responsive Information Architect (RIA), using a combination of AI and multimedia techniques. Here we describe RIA's capability of automated authoring of a coherent multimedia discourse, which is used by RIA to express itself when conversing with a user. Specifically, we focus o...
Article
Full-text available
Spoken dialogue system performance can vary widely for different users, as well for the same user during different dialogues. This paper presents the design and evaluation of an adaptive version of TOOT, a spoken dialogue system for retrieving online train schedules. Adaptive TOOT predicts whether a user is having speech recognition problems as a p...
Article
We explore three issues for the development of concept-to-speech (CTS) systems. We identify information available in a language-generation system that has the potential to impact prosody; investigate the role played by different corpora in CTS prosody modelling; and explore different methodologies for learning how linguistic features impact prosody...
Article
Generation of Intensive Care data), a system that generates multimedia briefings of a patient's status after having a bypass operation (Dalal et al. 1996; McKeown et al. 1997). We first describe information MAGIC generates in the process of producing language, turning next to the corpora we collected. We then provide a description of the more tradi...
Article
We aim to design and develop a Concept-to-Speech (CTS) generation system, a speech synthesis system producing speech from semantic representations, by integrating language generation with speech synthesis. We focus on five issues (1) how to employ newly available accurate discourse, semantic, and syntactic information produced by a natural language...
Article
In intonational phonology and speech synthesis research, it has been suggested that the relative informativeness of a word can be used to predict pitch prominence. The more information conveyed by a word, the more likely it will be accented. But there are others who express doubts about such a correlation. In this paper, we provide some empirical e...
Article
Full-text available
Recent technological advances have made it possible to build real-time, interactive spoken dialogue systems for a wide variety of applications. However, when users do not respect the limitations of such systems, performance typically degrades. Although users differ with respect to their knowledge of system limitations, and although different dialog...
Conference Paper
The capability to reallocate items--e.g. tasks, securities, bandwidth slices, Mega Watt hours of electricity, and collectibles--is a key feature in automated negotiation. Especially when agents have preferences over combinations of items, this is highly ...
Article
This paper identifies issues for language generation that arose in developing a multimedia interface to healthcare data that includes coordinated speech, text and graphics. In order to produce brief speech for time-pressured caregivers, the system both combines related information into a single sentence and uses abbreviated references in speech whe...
Article
Creating high-quality multimedia presentations requires much skill, time, and effort. This is particularly true when temporal media, such as speech and animation, are involved. We describe the design and implementation of a knowledge-based system that generates customized temporal multimedia presentations. We provide an overview of the system's arc...
Article
Concept To Speech (CTS) systems are closely related to two other types of systems: Natural Language Generation (NLG) and Speech Synthesis (SS). In this paper, we propose a new architecture for a CTS system. A Speech Integrating Markup Language (SIML) is designed as an general interface for integrating NLG and SS. We also present a CTS system for a...
Conference Paper
Addresses two important issues in generating spoken language within a multimedia system: the design of a speech generator to facilitate coordination between media, and extensions to the functionality of a written-language generation system to produce natural speech output. We demonstrate how a speech generator can produce information that allows fo...
Conference Paper
Full-text available
ABSTRACT Creating high - quality multimedia presentations requires much skill, time, and effort This is particularly true when temporal media, such as speech and animation, are involved We de - scribe the design and implementation of a knowledge - based system that generates customized temporal multimedia pre - sentations We provide an overview of...
Conference Paper
In Natural Language Processing (NLP), one key problem is how to design a robust and effective parsing system. In this paper, we will introduce a corpusbased Chinese parsing system. Our efforts are concetrated on: (1) knowledge acquisition and representation; and (2) the parsing scheme. The knowledge of this system is principally extracted from anal...
Article
Creating high-quality multimedia presentations requires much skill, time, and effort. This is particularly true when temporal media, such as speech and animation, are involved. We de- scribe the design and implementation of a knowledge-based system that generates customized temporal multimedia pre- sentations. We provide an overview of the system's...

Network

Cited By