Tim Paek

Stanford University, Palo Alto, California, United States

Are you Tim Paek?

Claim your profile

Publications (53)5.46 Total impact

  • Source
    Tim Paek, Eric J. Horvitz
    [Show abstract] [Hide abstract]
    ABSTRACT: Conversations abound with uncetainties of various kinds. Treating conversation as inference and decision making under uncertainty, we propose a task independent, multimodal architecture for supporting robust continuous spoken dialog called Quartet. We introduce four interdependent levels of analysis, and describe representations, inference procedures, and decision strategies for managing uncertainties within and between the levels. We highlight the approach by reviewing interactions between a user and two spoken dialog systems developed using the Quartet architecture: Prsenter, a prototype system for navigating Microsoft PowerPoint presentations, and the Bayesian Receptionist, a prototype system for dealing with tasks typically handled by front desk receptionists at the Microsoft corporate campus.
    01/2013;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Dictation using speech recognition could potentially serve as an efficient input method for touchscreen devices. However, dictation systems today follow a mentally disruptive speech interaction model: users must first formulate utterances and then produce them, as they would with a voice recorder. Because utterances do not get transcribed until users have finished speaking, the entire output appears and users must break their train of thought to verify and correct it. In this paper, we introduce Voice Typing, a new speech interaction model where users' utterances are transcribed as they produce them to enable real-time error identification. For fast correction, users leverage a marking menu using touch gestures. Voice Typing aspires to create an experience akin to having a secretary type for you, while you monitor and correct the text. In a user study where participants composed emails using both Voice Typing and traditional dictation, they not only reported lower cognitive demand for Voice Typing but also exhibited 29% relative reduction of user corrections. Overall, they also preferred Voice Typing.
    01/2012;
  • Conference Paper: SiMPE: 6
    Proceedings of the 13th Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile HCI 2011, Stockholm, Sweden, August 30 - September 2, 2011; 08/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mobile devices often utilize touchscreen keyboards for text input. However, due to the lack of tactile feedback and generally small key sizes, users often produce typing errors. Key-target resizing, which dynamically adjusts the underlying target areas of the keys based on their probabilities, can significantly reduce errors, but requires training data in the form of touch points for intended keys. In this paper, we introduce Text Text Revolution (TTR), a game that helps users improve their typing experience on mobile touchscreen keyboards in three ways: first, by providing targeting practice, second, by highlighting areas for improvement, and third, by generating ideal training data for key-target resizing as a side effect of playing the game. In a user study, participants who played 20 rounds of TTR not only improved in accuracy over time, but also generated useful data for key-target resizing. To demonstrate usefulness, we trained key-target resizing on touch points collected from the first 10 rounds, and simulated how participants would have performed had personalized key-target resizing been used in the second 10 rounds. Key-target resizing reduced errors by 21.4%.
    Pervasive Computing - 9th International Conference, Pervasive 2011, San Francisco, CA, USA, June 12-15, 2011. Proceedings; 01/2011
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Prior research has shown that when drivers look away from the road to view a personal navigation device (PND), driving performance is affected. To keep visual attention on the road, an augmented reality (AR) PND using a heads-up display could overlay a navigation route. In this paper, we compare the AR PND, a technology that does not currently exist but can be simulated, with two PND technologies that are popular today: an egocentric street view PND and the standard map-based PND. Using a high-fidelity driving simulator, we examine the effect of all three PNDs on driving performance in a city traffic environment where constant, alert attention is required. Based on both objective and subjective measures, experimental results show that the AR PND exhibits the least negative impact on driving. We discuss the implications of these findings on PND design as well as methods for potential improvement.
    Proceedings of the 13th Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile HCI 2011, Stockholm, Sweden, August 30 - September 2, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: Information Technology (IT) has had significant impact on the society and has touched all aspects of our lives. Up and until now computers and expensive devices have fueled this growth. It has resulted in several benefits to the society. The challenge now is to take this success of IT to its next level where IT services can be accessed by the users in developing regions. The focus of the workshop in 2011 is to identify the alternative sources of intelligence and use them to ease the interaction process with information technology. We would like to explore the different modalities, their usage by the community, the intelligence that can be derived by the usage, and finally the design implications on the user interface. We would also like to explore ways in which people in developing regions would react to collaborative technologies and/or use collaborative interfaces that require community support to build knowledge bases (example Wikipedia) or to enable effective navigation of content and access to services.
    Proceedings of the 2011 International Conference on Intelligent User Interfaces, February 13-16, 2011, Palo Alto, CA, USA; 01/2011
  • Source
    Tim Paek, Bo-June (Paul) Hsu
    [Show abstract] [Hide abstract]
    ABSTRACT: Text entry experiments evaluating the effectiveness of various input techniques often employ a procedure whereby users are prompted with natural language phrases which they are instructed to enter as stimuli. For experimental validity, it is desirable to control the stimuli and present text that is representative of a target task, domain or language. MacKenzie and Soukoreff (2001) manually selected a set of 500 phrases for text entry experiments. To demonstrate representativeness, they correlated the distribution of single letters in their phrase set to a relatively small (by current standards) corpus of English prior to 1966, which may not reflect the style of text input today. In this paper, we ground the notion of representativeness in terms of information theory and propose a procedure for sampling representative phrases from any large corpus so that researchers can curate their own stimuli. We then describe the characteristics of phrase sets we generated using the procedure for email and social media (Facebook and Twitter). The phrase sets and code for the procedure are publicly available for download.
    Proceedings of the International Conference on Human Factors in Computing Systems, CHI 2011, Vancouver, BC, Canada, May 7-12, 2011; 01/2011
  • [Show abstract] [Hide abstract]
    ABSTRACT: With the proliferation of pervasive devices and the increase in their processing capabilities, client-side speech processing has been emerging as a viable alternative. The SiMPE workshop series started in 2006 [5] with the goal of enabling speech processing on mobile and embedded devices to meet the challenges of pervasive environments (such as noise) and leveraging the context they offer (such as location). SiMPE 2010, the 5th in the series, will continue to explore issues, possibilities, and approaches for enabling speech processing as well as convenient and effective speech and multimodal user interfaces. Over the years, SiMPE has been evolving too, and since last year, one of our major goals has been to increase the participation of speech/multimodal HCI designers, and increase their interactions with speech processing experts. Multimodality got more attention in SiMPE 2008 than it has received in the previous years. In SiMPE 2007 [4], the focus was on developing regions. Given the importance of speech in developing regions, SiMPE 2008 had "SiMPE for developing regions" as a topic of interest. Speech User interaction in cars was a focus area in 2009 [2]. Given the multi-disciplinary nature of our goal, we hope that SiMPE will become the prime meeting ground for experts in these varied fields to bring to fruition, novel, useful and usable mobile speech applications.
    Proceedings of the 12th Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile HCI 2010, Lisbon, Portugal, September 7-10, 2010; 09/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mobile devices with touch capabilities often utilize touchscreen keyboards. However, due to the lack of tactile feedback, users often have to switch their focus of attention between the keyboard area, where they must locate and click the correct keys, and the text area, where they must verify the typed output. This can impair user experience and performance. In this paper, we examine multimodal feedback and guidance signals that keep users' focus of attention in the keyboard area but also provide the kind of information users would normally get in the text area. We first conducted a usability study to assess and refine the user experience of these signals and their combinations. Then we evaluated whether those signals which users preferred could also improve typing performance in a controlled experiment. One combination of multimodal signals significantly improved typing speed by 11%, reduced keystrokes-per-character by 8%, and reduced backspaces by 28%. We discuss design implications.
    Proceedings of the 12th Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile HCI 2010, Lisbon, Portugal, September 7-10, 2010; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Soft keyboards offer touch-capable mobile and tabletop devices many advantages such as multiple language support and room for larger displays. On the other hand, because soft keyboards lack haptic feedback, users often produce more typing errors. In order to make soft keyboards more robust to noisy input, researchers have developed key-target resizing algorithms, where underlying target areas for keys are dynamically resized based on their probabilities. In this paper, we describe how overly aggressive key-target resizing can sometimes prevent users from typing their desired text, violating basic user expectations about keyboard functionality. We propose an anchored key-target method which incorporates usability principles so that soft keyboards can remain robust to errors while respecting usability principles. In an empirical evaluation, we found that using anchored dynamic key-targets significantly reduce keystroke errors as compared to the state-of-the-art.
    Proceedings of the 2010 International Conference on Intelligent User Interfaces, February 7-10, 2010, Hong Kong, China; 01/2010
  • Source
    Yun-Cheng Ju, Tim Paek
    [Show abstract] [Hide abstract]
    ABSTRACT: Speech recognition affords automobile drivers a hands-free, eyes-free method of replying to Short Message Service (SMS) text messages. Although a voice search approach based on template matching has been shown to be more robust to the challenging acoustic environment of automobiles than using dictation, users may have difficulties verifying whether SMS response templates match their intended meaning, especially while driving. Using a high-fidelity driving simulator, we compared dictation for SMS replies versus voice search in increasingly difficult driving conditions. Although the two approaches did not differ in terms of driving performance measures, users made about six times more errors on average using dictation than voice search.
    ACL 2010, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, July 11-16, 2010, Uppsala, Sweden, Short Papers; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As users of social networking websites expand their network of friends, they are often flooded with newsfeed posts and status updates, most of which they consider to be "unimportant" and not newsworthy. In order to better understand how people judge the importance of their newsfeed, we conducted a study in which Facebook users were asked to rate the importance of their newsfeed posts as well as their friends. We learned classifiers of newsfeed and friend importance to identify predictive sets of features related to social media properties, the message text, and shared background information. For classifying friend importance, the best performing model achieved 85% accuracy and 25% error reduction. By leveraging this model for classifying newsfeed posts, the best newsfeed classifier achieved 64% accuracy and 27% error reduction.
    Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2010, Atlanta, Georgia, USA, July 11-15, 2010; 01/2010
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: With the proliferation of pervasive devices and the increase in their processing capabilities, client-side speech processing has been emerging as a viable alternative. SiMPE 2009, the fourth in the series, will continue to ex- plore issues, possibilities, and approaches for enabling speech processing as well as convenient and eectiv e speech and multimodal user interfaces. One of our major goals for SiMPE 2009 is to increase the participation of speech/multimodal HCI designers, and increase their interactions with speech processing experts. Multimodality got more attention in SiMPE 2008 than it has received in the previous years. In SiMPE 2007 (3), the focus was on developing regions. Given the importance of speech in developing regions, SiMPE 2008 had \SiMPE for developing regions" as a topic of interest. We think of this as a key emerging area for mobile speech applications, and will continue this in 2009 as well.
    Proceedings of the 11th Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile HCI 2009, Bonn, Germany, September 15-18, 2009; 09/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays, personal navigation devices (PNDs) that provide GPS-based directions are common in vehicles. These devices typically display the real-time location of the vehicle on a map and play spoken prompts when drivers need to turn. While such devices appear to be less distracting than paper directions, in-car displays may distract drivers from their primary task of driving. In experiments conducted with a high fidelity driving simulator, we found that using paper directions degrades driving performance and visual attention significantly more than using a navigation device that provides either a map with spoken prompts or has spoken prompts only. This was expected. However, we also found that having just spoken prompts affected visual attention the least. We discuss the implications of these findings on PND design for vehicles.
    05/2009;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: As users enter web queries, real-time query expansion (RTQE) interfaces offer suggestions based on an index garnered from query logs. In selecting a suggestion, users can potentially reduce keystrokes, which can be very beneficial on mobile devices with deficient input means. Unfortunately, RTQE interfaces typically provide little assistance when only parts of an intended query appear among the suggestion choices. In this paper, we introduce Phrase Builder, an RTQE interface that reduces keystrokes by facilitating the selection of individual query words and by leveraging back-off query techniques to offer completions for out- of-index queries. We describe how we implemented a small memory footprint index and retrieval algorithm, and discuss lessons learned from three versions of the user interface, which was iteratively designed through user studies. Compared to standard auto-completion and typing, the last version of Phrase Builder reduced more keystrokes-per-character, was perceived to be faster, and was overall preferred by users.
    Proceedings of the 11th Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile HCI 2009, Bonn, Germany, September 15-18, 2009; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Nowadays, personal navigation devices (PNDs) that provide GPS- based directions are widespread in vehicles. These devices typically display the real-time location of the vehicle on a map and play spoken prompts when drivers need to turn. While such devices are less distracting than paper directions, their graphical display may distract users from their primary task of driving. In experiments conducted with a high fidelity driving simulator, we found that drivers using a navigation system with a graphical display indeed spent less time looking at the road compared to those using a navigation system with spoken directions only. Furthermore, glancing at the display was correlated with higher variance in driving performance measures. We discuss the implications of these findings on PND design for vehicles.
    Proceedings of 1st International Conference on Automotive User Interfaces and Interactive Vehicular Applications, AutomotiveUI 2009, in-cooperation with ACM SIGCHI, Essen, Germany, 21-22 September 2009; 01/2009
  • Source
    Yun-Cheng Ju, Tim Paek
    [Show abstract] [Hide abstract]
    ABSTRACT: Automotive infotainment systems now provide drivers the ability to hear incoming Short Message Service (SMS) text messages using text-to-speech. However, the question of how best to allow users to respond to these messages using speech recognition remains unsettled. In this paper, we propose a robust voice search approach to replying to SMS messages based on template matching. The templates are empirically derived from a large SMS corpus and matches are accurately retrieved using a vector space model. In evaluating SMS replies within the acoustically challenging environment of automobiles, the voice search approach consistently outperformed using just the recognition results of a statistical language model or a probabilistic context-free grammar. For SMS replies covered by our templates, the approach achieved as high as 89.7% task completion when evaluating the top five reply candidates. Index Terms: SMS, information retrieval, voice UI, voice search
    INTERSPEECH 2009, 10th Annual Conference of the International Speech Communication Association, Brighton, United Kingdom, September 6-10, 2009; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The small physical size of mobile devices imposes dramatic restrictions on the user interface (UI). With the ever increas- ing capacity of these devices as well as access to large on- line stores it becomes increasingly important to help the user select a particular item efficiently. Thus, we propose bi- nary search with character pinning, where users can con- strain their search to match selected prefix characters while making simple binary decisions about the position of their intended item in the lexicographic order. The underlying in- dex for our method is based on a ternary search tree that is optimal under certain user-oriented constraints. To bet- ter scale to larger indexes, we analyze several heuristics that rapidly construct good trees. A user study demonstrates that our method helps users conduct rapid searches, using less keystrokes, compared to other methods.
    Proceedings of the 2009 International Conference on Intelligent User Interfaces, February 8-11, 2009, Sanibel Island, Florida, USA; 01/2009
  • Source
    Tim Paek, Yun-Cheng Ju
    [Show abstract] [Hide abstract]
    ABSTRACT: Voice search applications encourage users to "just say what you want" in order to obtain useful mobile content such as automated directory assistance (ADA). Unfortunately, when users only remember part of what they are looking for, they are forced to guess, even though what they know may be sufficient to retrieve the desired information. In this paper, we propose expanding the capabilities of voice search to allow users to explicitly express their uncertainties as part of their queries, and as such, to provide partial knowledge. Applied to ADA, we highlight the enhanced user experience uncertain expressions afford and delineate how we performed language modeling and information retrieval. We evaluate our approach by assessing its impact on overall ADA performance and by discussing the results of an experiment in which users generated both uncertain expressions as well as guesses for directory listings. Uncertain expressions reduced relative error rate by 31.8% compared to guessing. Index Terms: voice search, user uncertainty, something
    INTERSPEECH 2008, 9th Annual Conference of the International Speech Communication Association, Brisbane, Australia, September 22-26, 2008; 01/2008
  • Source
    Tim Paek, Roberto Pieraccini
    [Show abstract] [Hide abstract]
    ABSTRACT: In designing a spoken dialogue system, developers need to specify the actions a system should take in response to user speech input and the state of the environment based on observed or inferred events, states, and beliefs. This is the fundamental task of dialogue management. Researchers have recently pursued methods for automating the design of spoken dialogue management using machine learning techniques such as reinforcement learning. In this paper, we discuss how dialogue management is handled in industry and critically evaluate to what extent current state-of-the-art machine learning methods can be of practical benefit to application developers who are deploying commercial production systems. In examining the strengths and weaknesses of these methods, we highlight what academic researchers need to know about commercial deployment if they are to influence the way industry designs and practices dialogue management.
    Speech Communication. 01/2008;

Publication Stats

879 Citations
5.46 Total Impact Points

Institutions

  • 2000–2013
    • Stanford University
      • Department of Psychology
      Palo Alto, California, United States
  • 2003–2011
    • Microsoft
      Washington, West Virginia, United States
  • 2010
    • Indian Institute of Technology Kanpur
      Cawnpore, Uttar Pradesh, India
  • 2009
    • University of New Hampshire
      • Department of Electrical Engineering
      Durham, New Hampshire, United States