DocMIR: An automatic document-based indexing system for meeting retrieval

University of Leeds School of Computing Leeds LS2 9JT United Kingdom
Multimedia Tools and Applications (Impact Factor: 1.06). 04/2008; 37(2):135-167. DOI: 10.1007/s11042-007-0137-4
Source: DBLP

ABSTRACT This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the
documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically
apply the above-mentioned procedures to a lecture and automatically index the event according to the presented slides and
their contents. For indexing, the system requires neither specific software installed on the presenter’s computer nor any
conscious intervention of the speaker throughout the presentation. The only material required by the system is the electronic
presentation file of the speaker. Even if not provided, the system would temporally segment the presentation and offer a simple
storyboard-like browsing interface. The system runs on several capture boxes connected to cameras and microphones that records
events, synchronously. Once the recording is over, indexing is automatically performed by analyzing the content of the captured
video containing projected documents and detects the scene changes, identifies the documents, computes their duration and
extracts their textual content. Each of the captured images is identified from a repository containing all original electronic
documents, captured audio–visual data and metadata created during post-production. The identification is based on documents’
signatures, which hierarchically structure features from both layout structure and color distributions of the document images.
Video segments are finally enriched with textual content of the identified original documents, which further facilitate the
query and retrieval without using OCR. The signature-based indexing method proposed in this article is robust and works with
low-resolution images and can be applied to several other applications including real-time document recognition, multimedia
IR and augmented reality systems.

  • [Show abstract] [Hide abstract]
    ABSTRACT: When capturing multimedia records of collaborative activities (e.g. lectures, meetings, etc.), later access to the captured activity is usually provided by a linear video comprising the contents of the exchanged media. Such alternative impairs the review of the activity to the extent that the user only counts on the traditional timeline-based video controls. In scenarios in which automated tools generate interactive multimedia documents as a result of capturing an activity, the literature reports the use of ink-based and audio-based operators that allow the identification of points of interaction in the resulting document. In previous work, we defined a taxonomy of media-based operators in order to take into account user interactions with boards and videos, and extended the audio and ink-based operators with action-based alternatives. In this paper, the applicability of the operators is demonstrated in the context of the automatic generation of multimedia documents for interactive TV. Results from the user evaluation suggest that the operators provide means for faster review of a session when compared to a linear video.
    ACM SIGAPP Applied Computing Review 01/2011;
  • [Show abstract] [Hide abstract]
    ABSTRACT: When generating records from captured meetings, such as video lectures or distance education activities supported by synchronous communication tools, the alternative usually adopted is to generate a linear video with the contents of the exchanged media. Such approach limits the review of the meeting, reducing it to watching a video using the traditional time-based video controls. In other scenarios, the literature reports the use of media-based operators, like ink-based and audio-based operators, that allow the indexing of points of interaction in the resulting document. In this paper we tackle the issue of automatically generating document-based browsers by means of several types of indexes captured during recording and post-production phases of the multimedia production process. These indexes are used to provide an interface focused on menu navigation to create compositions of logical operators in order to improve the access to points of interest by generating interactive timelines. Our document-centric approach tackles challenges for meeting browsers: in particular, the approach enables the efficient review of meeting recordings via a constrained device such as a TV set-top box and a remote control. In terms of evaluation, we conducted two user studies in order to verify our model. Overall, the evaluation results suggested that the approach provided a satisfactory level of usability and that users understood the proposed menu navigation approach to review the recorded sessions.
  • [Show abstract] [Hide abstract]
    ABSTRACT: The recording of multimedia sessions of small group activities, such as meetings, lectures and webconferences, is becoming increasingly popular following the improvements in recording technologies, the popularity of web-based media repositories, and the opportunity of reviewing or sharing the activity in a later moment. When reviewing a (potentially long) multimedia session, a user may not be interested in watching it linearly as a whole, but only on browsing or skimming specific fragments of interest. For tackling this requirement, the literature reports the opportunity of indexing interaction events that typically occur in these activities, such as slide transitions, speech turns and gestures. In this paper we investigate the issue of combining different types of interaction events as an aid to navigate in multimedia sessions. First, we define temporal interval-based composition operators so that the semantics of groups of annotations can be orchestrated in queries. Second, we demonstrate the operators in a web-based multimedia player that enable users to browse a multimedia session with the aid of visualizations and filters over interaction-focused annotations. In order to experiment with the model, we report a case study with multimedia information recorded by a capture environment in use.
    Proceedings of the 18th Brazilian symposium on Multimedia and the web; 10/2012

Full-text (2 Sources)

Available from
May 20, 2014