Péter Pál Boda

Nokia Research Center, Palo Alto, California, United States

Are you Péter Pál Boda?

Claim your profile

Publications (15)2.06 Total impact

  • Jun Yang, Hong Lu, Zhigang Liu, Péter Pál Boda
    [Show abstract] [Hide abstract]
    ABSTRACT: In this book chapter, we present a novel system that recognizes and records the physical activity of a person using a mobile phone. The sensor data is collected by built-in accelerometer sensor that measures the motion intensity of the device. The system recognizes five everyday activities in real-time, i.e., stationary, walking, running, bicycling, and in vehicle. We first introduce the sensor’s data format, sensor calibration, signal projection, feature extraction, and selection methods. Then we have a detailed discussion and comparison of different choices of feature sets and classifiers. The design and implementation of one prototype system is presented along with resource and performance benchmark on Nokia N95 platform. Results show high recognition accuracies for distinguishing the five activities. The last part of the chapter introduces one demo application built on top of our system, physical activity diary, and a selection of potential applications in mobile wellness, mobile social sharing and contextual user interface domains.
    09/2010: pages 185-213;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: PEIR, the Personal Environmental Impact Report, is a participatory sensing application that uses location data sam- pled from everyday mobile phones to calculate personalized estimates of environmental impact and exposure. It is an ex- ample of an important class of emerging mobile systems that combine the distributed processing capacity of the web with the personal reach of mobile technology. This paper doc- uments and evaluates the running PEIR system, which in- cludes mobile handset based GPS location data collection, and server-side processing stages such as HMM-based ac- tivity classification (to determine transportation mode); au- tomatic location data segmentation into "trips"; lookup of traffic, weather, and other context data needed by the mod- els; and environmental impact and exposure calculation us- ing efficient implementations of established models. Addi- tionally, we describe the user interface components of PEIR and present usage statistics from a two month snapshot of system use. The paper also outlines new algorithmic compo- nents developed based on experience with the system and un- dergoing testing for inclusion in PEIR, including: new map- matching and GSM-augmented activity classification tech- niques, and a selective hiding mechanism that generates be- lievable proxy traces for times a user does not want their real location revealed.
    Proceedings of the 7th International Conference on Mobile Systems, Applications, and Services (MobiSys 2009), Kraków, Poland, June 22-25, 2009; 01/2009
  • Alan L. Liu, Jun Yang, Péter Pál Boda
    [Show abstract] [Hide abstract]
    ABSTRACT: This work presents a gesture recognition system via continuous maximum entropy (MaxEnt) training on accelerometer data. MaxEnt models are commonly learned using generalized iterative scaling (GIS), which is an iterative algorithm for most convex optimization problems.
    Proceedings of the 8th International Conference on Information Processing in Sensor Networks, IPSN 2009, April 13-16, 2009, San Francisco, California, USA; 01/2009
  • Mobile Computing, Applications, and Services - First International ICST Conference, MobiCASE 2009, San Diego, CA, USA, October 26-29, 2009, Revised Selected Papers; 01/2009
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Widgets are embeddable objects that provide easy and ubiquitous access to dynamic information sources, for example weather, news or TV program information. Widgets are typically rather static - they provide the information regardless of whether the information is relevant to the user's current information needs. In this paper we introduce Capricorn, which is an intelligent interface for mobile widgets. The interface uses various adaptive web techniques for facilitating navigation. For example, we use collaborative filtering to recommend suitable widgets and we dim infrequently used widgets. The demonstration presents the Capricorn interface focusing on the adaptive parts of the interface. The user interface is web-based, and as such platform independent. However, our target environment is mobile phones, and thus the interface has been optimized for mobile phones.
    Proceedings of the 13th international conference on Intelligent user interfaces; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Social interaction is an essential element of our d aily lives. One could argue that such interaction is even more impo rtant while on vacation. To showcase certain technology enablers, we implement a cruise ship scenario with a few advanced applicat ions. The cruise ship scenario serves not only as a tangible goal but also as a metaphor: these applications can be adapted easily to other environments such as office, classroom, conference, exhibition, museum, malls, etc. Cruise ships represent a unique environment for social life; passengers live in the same time a nd space and attend the same social activities together, with fr equent physical encounters. They also produce/consume large amounts of media content and heavily interact with each other. To me et people's social networking needs we propose a new paradigm called Social Proximity Network (SPN). SPN applications are built on our connectivity and indoor positioning infrastructure, as well as on advanced device-based utilities. By relying on the sensing power of today's mobile devices and mashing up digital co ntent with physical context, SPN services are able to provide rich and unique experiences to cruise passengers, both during and a fter the trip.
    Mobile Interaction with the Real World 2008, MIRW 2008, Mobile HCI Workshop, Amsterdam, The Netherlands, September 2, 2008; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Widgets are embeddable objects that provide easy and ubiquitous access to dynamic information sources, e.g., weather, news or TV program information. Interactions with widgets take place through a so-called widget engine, which is a specialized client-side runtime component that also provides functionalities for managing widgets. As the number of supported widgets increases, managing widgets becomes increasingly complex. For example, finding relevant or interesting widgets becomes difficult and the user interface easily gets cluttered with irrelevant widgets. In addition, interacting with information sources can be cumbersome, especially on mobile platforms. In order to facilitate widget management and interactions, we have developed Capricorn, an intelligent user interface that integrates adaptive navigation techniques into a widget engine. This paper describes the main functionalities of Capricorn and presents the results of a usability evaluation that measured user satisfaction and compared how user satisfaction varies between desktop and mobile platforms.
    Proceedings of the 10th Conference on Human-Computer Interaction with Mobile Devices and Services, Mobile HCI 2008, Amsterdam, the Netherlands, September 2-5, 2008; 01/2008
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: Mobiscopes extend the traditional sensor network model, introducing challenges in data management and integrity, privacy, and network system design. Researchers need an architecture and general methodology for designing future mobiscopes. A mobiscope is a federation of distributed mobile sensors into a taskable sensing system that achieves high-density sampling coverage over a wide area through mobility.
    IEEE Pervasive Computing 05/2007; 6(2):20-29. · 2.06 Impact Factor
  • Source
    Péter Pál Boda
    [Show abstract] [Hide abstract]
    ABSTRACT: Multimodal Integration addresses the problem of combining various user inputs into a single semantic representation that can be used in deciding the next step of system action(s). The method presented in this paper uses a statistical framework to implement the integration mechanism and includes contextual information additionally to the actual user input. The underlying assumption is that the more information sources are taken into account, the better picture can be drawn about the actual intention of the user in the given context of the interaction. The paper presents the latest results with a Maximum Entropy classifier, with special emphasis on the use of contextual information (type of gesture movements and type of objects selected). Instead of explaining the design and implementation process in details (a longer paper to be published later will do that), only a short description is provided here about the demonstration implementation that produces above 91% accuracy for the 1st best and higher than 96% for the accumulated five N-bests results.
    Proceedings of the 8th International Conference on Multimodal Interfaces, ICMI 2006, Banff, Alberta, Canada, November 2-4, 2006; 01/2006
  • [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a generic architecture that enables the development and execution of mobile multimodal applications proposed within the EU IST-511607 project MobiLife. Mobi Life aims at exploiting the synergetic use of multimodal user interface technology and contextual information processing, with the ultimate goal that the two together can provide a beyond-the-state-of-the-art user experience. And this led to an integrated concept, components of the underlying architecture are described in detail and the interfaces towards the application back-end as well as towards context aware resources are discussed. The paper also positions the current work against existing standardisation efforts and it pinpoints technologies required to support the implementation of a device and modality function within the MobiLife architecture
    Personal, Indoor and Mobile Radio Communications, 2005. PIMRC 2005. IEEE 16th International Symposium on; 10/2005
  • Péter Pál Boda
    [Show abstract] [Hide abstract]
    ABSTRACT: Integration of various user input channels for a multimodal interface is not just an engineering problem. To fully understand users in the context of an application and the current session, solutions are sought that process information from different intentional, i.e. user-originated, as well as from passively available sources in a uniform manner. As a first step towards this goal, the work demonstrated here investigates how intentional user input (e.g. speech, gesture) can be seamlessly combined to provide a single semantic interpretation of the user input. For this classical Multimodal Integration problem the Maximum Entropy approach is demonstrated with 76.52% integration accuracy for the 1st and 86.77% accuracy for the top 3-best candidates. The paper also exhibits the process that generates multimodal data for training the statistical integrator, using transcribed speech from MIT's Voyager application. The quality of the generated data is assessed by comparing to real inputs to the multimodal version of Voyager.
    Proceedings of the 6th International Conference on Multimodal Interfaces, ICMI 2004, State College, PA, USA, October 13-15, 2004; 01/2004
  • Source
    Péter Pál Boda, Edward Filisko
    [Show abstract] [Hide abstract]
    ABSTRACT: This paper introduces a method that generates simulated multimodal input to be used in test-ing multimodal system implementations, as well as to build statistically motivated multi-modal integration modules. The generation of such data is inspired by the fact that true mul-timodal data, recorded from real usage scenar-ios, is difficult and costly to obtain in large amounts. On the other hand, thanks to opera-tional speech-only dialogue system applica-tions, a wide selection of speech/text data (in the form of transcriptions, recognizer outputs, parse results, etc.) is available. Taking the tex-tual transcriptions and converting them into multimodal inputs in order to assist multimo-dal system development is the underlying idea of the paper. A conceptual framework is es-tablished which utilizes two input channels: the original speech channel and an additional channel called Virtual Modality. This addi-tional channel provides a certain level of ab-straction to represent non-speech user inputs (e.g., gestures or sketches). From the tran-scriptions of the speech modality, pre-defined semantic items (e.g., nominal location refer-ences) are identified, removed, and replaced with deictic references (e.g., here, there). The deleted semantic items are then placed into the Virtual Modality channel and, according to external parameters (such as a pre-defined user population with various deviations), tem-poral shifts relative to the instant of each cor-responding deictic reference are issued. The paper explains the procedure followed to cre-ate Virtual Modality data, the details of the speech-only database, and results based on a multimodal city information and navigation application.
    01/2004;
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: It has been hypothesized that the Finnish language is well suited to speech-to-text conversion for the communication aids of the hearing impaired. In a related study it was shown that, depend- ing on context, 10 to 20 % of phoneme errors can be tolerated with good comprehension when reading text converted from raw phonemic recognition. Two sets of phoneme recognition exper- iments were carried out in this study in order to evaluate the performance of existing speech recognition systems in this ap- plication. For telephone bandwidth speech both systems showed speaker dependent error scores of about 10 % or below, thus supporting the feasibility of the application. For speaker inde- pendent cases the error rate was typically more than 20 % which is too high for effortless and fluent communication. The application idea motivating this study is based on the rel- atively high recognition score of phonemic speech recognition and the almost one-to-one correspondence between spoken and written forms in the Finnish language. This suggests that it could be possible to construct communication aids for the hearing im- paired, based on speech-to-text (STT) and text-to-speech (TTS) conversions. Three kinds of communication aids have been considered based on STT and TTS techniques. First, an automatic STT conversion for a deaf person and TTS conversion back to a normal hearing person over a telephone line could be implemented. As will be shown below, for a limited set of pre-trained speakers this is al- ready feasible now. As another application, an automatic STT interpreter for deaf subjects could be applied for meetings at- tended by normal and deaf persons. The limited speaking ability of deaf subjects as well as external noise may make the applica- tion difficult. As an ultimate form of STT conversion, a portable personal device could do the speech-to-text interpretation. Presently there are communication services available in Finland based on human STT and TTS interpretation by trained persons, e.g., in the telephone network, or similar services in meetings attended by deaf persons. Such aid, based on human interpreters, is expensive and often problematic due to intimate discussions that are interpreted. The Finnish language appears to be very well suited to auto- matic STT conversion on the phonemic level for two reasons. First, a string of phonemes is very easily mapped to text so that only in very rare cases do problems appear. Second, it has been shown that phoneme recognition scores of up to about 95 % are feasible with existing methods (9). Thus, the STT conversion could be done with a good phonemic speech recognizer, leaving the final word and content recognition to the subject reading the raw output (grapheme string) on a screen. The other conversion direction, i.e., TTS synthesis, is no technical problem; several synthesizers exist for Finnish. In another part of the study we have assessed the recognition score requirements by simulating the reading of recognized mes- sages with phonemic errors (1). Random deletions, insertions, and replacements of phonemes, both in-class and intra-class, were generated into isolated words, isolated sentences and di- alog texts. The comprehension and the reaction time of reading such erroneous messages as a function of phonemic error rate were tested with a set of subjects using a computer program. As a result we found that for isolated words comprehension is good up to about a 10 % error rate, for sentences up to 10-15 % errors, and for dialog sentences up to about 20 %. This result sets bounds for the recognition of STT conversion in the present application domain.
    Fifth European Conference on Speech Communication and Technology, EUROSPEECH 1997, Rhodes, Greece, September 22-25, 1997; 01/1997
  • Source
    [Show abstract] [Hide abstract]
    ABSTRACT: The paper introduces three multimodal context-aware mobile demonstrator applications designed and developed within the scope of the EU IST-511607 project MobiLife. The three, family-oriented applications Mobile Multimedia Infotainer, Wellness-Aware Multimodal Gaming System and FamilyMap, provide advanced multimodal user interactions supported with context-aware functionalities, such as personalisation and profiling. The paper briefly explains the underlying architectural solutions and how the development work fits to the User-Centered Design process. The ultimate intention is to enhance the acceptance and usability of current mobile applications with beyond-state-of-the-art user interaction capabilities, by researching how contextual information can affect the functionality of the multimodal user interface and how to provide the users with a seamless, habitable and non-intrusive user interaction experience.
  • Péter Boda, Mikko Pál, Lehtokangas
    [Show abstract] [Hide abstract]
    ABSTRACT: Results achieved with continuous speech recognition over simulated GSM channel are presented. The speech material used is the speaker-independent data from the Resource Management (RM) database. Various training and test conditions are investigated: three different types of speech codecs in combinations with three error patterns simulating transmission quality degradation and one telephone handset masking pattern. The results indicate that speech codecs affect in significantly poorer performance. Training with combined set (default speech and speech via codec) did not help in terms of robust performance. Finally, a distributed scheme is proposed to overwhelm speech quality degradation from the recogniser point of view.

Publication Stats

318 Citations
2.06 Total Impact Points

Top Journals

Institutions

  • 2008–2010
    • Nokia Research Center
      Palo Alto, California, United States
  • 2009
    • Palo Alto Research Center
      Palo Alto, California, United States
  • 2004–2008
    • Nokia Research Center (NRC)
      Helsinki, Southern Finland Province, Finland