Conference Paper

From Artifact to Content Source: Using Multimodality in Video to Support Personalized Recomposition

Authors:
To read the full-text of this research, you can request a copy directly from the author.

Abstract

Video content is being produced in ever increasing quantities. It is practically impossible for any user to see every piece of video which could be useful to them. We need to look at video content differently. Videos are composed of a set of features, namely the moving video track, the audio track and other derived features, such as a transcription of the spoken words. These different features have the potential to be recomposed to create new video offerings. However, a key step in achieving such recomposition is the appropriate decomposition of those features into useful assets. Video artifacts can therefore be considered a type of multimodal source which may be used to support personalized and contextually aware recomposition. This work aims to propose and validate an approach which will convert a video from a single artifact into a diverse query-able content source.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... In our work, TED talk data and user feedback provide us with the opportunity to evaluate the presentation from a viewer's perspective, not a teacher's perspective. In previous work, TED talk data and ratings were used to detect the engagement level using low-level acoustic features and some high-level visual features (camera shots durations and angle, the numbers of laughter and applause) [9,21,22]. However, those approaches are not able to generate feedback for presenters. ...
... This is often materialized using summaries of videos, like storyboards [10] [11], layered timelines [12], and video skims [13]. Salim [14] attempts to decompose the video content and achieve a personalized recomposition. In a review of multimodal feature extraction to create a video summary [15] the authors have considered all three modalities (audio, visual, and linguistic) to generate summaries. ...
Conference Paper
Full-text available
Learning at the workplace is largely informal and there is a high potential to make it more effective and efficient by means of technology, especially by using the power of multimedia. The main challenge is to find relevant information segments in a vast amount of multimedia resources for a particular objective, context and user. In this paper, we aim to bridge this gap using a personalized and adaptive video consumption strategy for professional communities. Our solution highlights relevant concepts within segments of video resources by means of collaborative semantic annotations, analyzes them based on the user’s learning objectives and recomposes them anew in a personalized way. As the preferred adaptation may be context dependent, the user has the opportunity to select a predefined adaptation strategy or to specify a new one easily. The approach uses a Web-based system that outputs a relevant mix of information from multiple videos, based on the user preferences and existing video annotations. The system is open source and uses an extendable approach based on micro-services. The performed evaluation investigated the usability and usefulness of the approach. It showed that effectiveness and especially efficiency of such informal learning could be indeed better with adaptive video techniques applied. On the other hand, collected ideas on how to improve the usability of the system show opportunities for its further improvements. These results suggest that personalization and adaptive techniques applied on video data are a good direction to proceed in facilitating informal learning in workplace environments.
... We extracted multimodal features from TED videos and use them to identify their relationship with user engagement. We believe that this experiment will be a useful building block for our envisioned system to provide personalized and contextually aware recomposed video slices [18]. ...
Conference Paper
These days, several hours of new video content is uploaded to the internet every second. It is simply impossible for anyone to see every piece of video which could be engaging or even useful to them. Therefore it is desirable to identify videos that might be regarded as engaging automatically, for a variety of applications such as recommendation and personalized video segmentation etc. This paper explores how multimodal characteristics of video, such as prosodic, visual and paralinguistic features, can help in assessing user engagement with videos. The approach proposed in this paper achieved good accuracy (maximum F score of 96.93 %) through a novel combination of features extracted directly from video recordings, demonstrating the potential of this method in identifying engaging content.
... This feedback can be used by the talker to improve the engagement level of a talk. The results will also feed into our envisioned system to provide personalized and contextually aware recomposed video slices [16]. ...
Article
Full-text available
Personalised web information systems have in recent years been evolving to provide richer and more tailored experiences for users than ever before. In order to provide even more interactive experiences as well as to address new opportunities, the next generation of Personalised web information systems needs to be capable of dynamically personalising not just web media but web services as well. In particular, eLearning provides an example of an application domain where learning activities and personalisation are of significant importance in order to provide learners with more engaging and effective learning experiences. This paper presents a novel approach and technical framework called AMASE to support the dynamic generation and enactment of Personalised Learning Activities, which uniquely entails the personalisation of media content and the personalisation of services in a unified manner. In doing so, AMASE follows a narrative approach to personalisation that combines state of the art techniques from both adaptive web and adaptive workflow systems.
Article
Full-text available
The production of resources supporting the needs of Adaptive Hypermedia Systems (AHSs) is labor-intensive. As a result, content production is focused upon meeting the needs of resources with higher demand, which limits the extent to which numerous and diverse content requirements of AHSs can be met. Open Corpus Slicing attempts to convert the wealth of information available on the web, into customisable information objects. This approach could provide the basis of an open corpus supply service meeting more diverse and unpredictable content requirements of AHSs. This paper takes a case study approach, focusing on an educational sector of adaptive hypermedia, to test out the effect of using Slicepedia, a service which enables the reuse and customisation of open corpus resources. An architecture and implementation of the system is presented along with two user-trial evaluations, involving 91 participants, which suggest that slicing techniques represent a valid content production supply for AHSs
Article
Full-text available
Digital humanities initiatives play an important role in making cultural heritage collections accessible to the global community of researchers and general public for the first time. Further work is needed to provide useful and usable tools to support users in working with those digital contents in virtual environments. The CULTURA project has developed a corpus agnostic research environment integrating innovative services that guide, assist and empower a broad spectrum of users in their interaction with cultural artefacts. This article presents (1) the CULTURA system and services and the two collections that have been used for testing and deploying the digital humanities research environment, and (2) an evaluation methodology and formative evaluation study with apprentice researchers. An evaluation model was developed which has served as a common ground for systematic evaluations of the CULTURA environment with user communities around the two test bed collections. The evaluation method has proven to be suitable for accommodating different evaluation strategies and allows meaningful consolidation of evaluation results. The evaluation outcomes indicate a positive perception of CULTURA. A range of useful suggestions for future improvement has been collected and fed back into the development of the next release of the research environment.
Article
Full-text available
An approach to identifying a viewer's video preferences uses hidden Markov models by combining visual features and closed captions.
Article
Full-text available
Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.
Article
Full-text available
The tremendous growth in video data calls for efficient and flexible access mechanisms. In this paper, we propose an ontology-driven framework for presentation video annotation and access. The goal is to integrate ontology into video systems to improve users' video access experience. To realize ontology-driven video annotation, the first and foremost step is video segmentation. Current research in video segmentation has mainly focused on the visual and/or auditory modalities. In this paper, we investigate how to combine visual and textual information in the hierarchical segmentation of presentation video data. With average F-scores over 0.92, our experiments show that the proposed segmentation procedure is effective. After a video is segmented, video annotation data can be extracted. To extract annotation data from a video and its segments, and to organize them in a way that facilitates video access, we propose a multi- ontology based multimedia annotation model. In this model, a domain-independent multimedia ontology is integrated with multiple domain ontologies. The goal is to provide multiple, domain-specific views of the same multimedia content and, thus, better address different users' information needs. With extracted annotation data, ontology-driven video access explores domain knowledge embedded in domain ontologies and tailors the video access to the specific needs of individual users from different domains. Our experience suggests that ontology-driven video access can improve video retrieval relevancy and, thus, enhance users' video access experience.
Article
Full-text available
User engagement is a key concept in designing user-centred web applications. It refers to the quality of the user experi-ence that emphasises the positive aspects of the interaction, and in particular the phenomena associated with being cap-tivated by technology. This definition is motivated by the observation that successful technologies are not just used, but they are engaged with. Numerous methods have been proposed in the literature to measure engagement, however, little has been done to validate and relate these measures and so provide a firm basis for assessing the quality of the user experience. Engagement is heavily influenced, for ex-ample, by the user interface and its associated process flow, the user's context, value system and incentives. In this paper we propose an approach to relating and de-veloping unified measures of user engagement. Our ulti-mate aim is to define a framework in which user engagement can be studied, measured, and explained, leading to recom-mendations and guidelines for user interface and interaction design for front-end web technology. Towards this aim, in this paper, we consider how existing user engagement met-rics, web analytics, information retrieval metrics, and mea-sures from immersion in gaming can bring new perspective to defining, measuring and explaining user engagement.
Article
Full-text available
Professional video searchers typically have to search for particular video fragments in a vast video archive that contains many hours of video data. Without having the right video archive exploration tools, this is a difficult and time consuming task that induces hours of video skimming. We propose the video archive explorer, a video exploration tool that provides visual representations of automatically detected concepts to facilitate individual and collaborative video search tasks. This video archive explorer is developed by employing a user-centred methodology, which ensures that the tool is more likely to fit to the end user needs. A qualitative evaluation with professional video searchers shows that the combination of automatic video indexing, interactive visualisations and user-centred design can result in an increased usability, user satisfaction and productivity. KeywordsSearching and browsing video archives–Information filtering–User-centred software engineering–Interactive visualisations
Conference Paper
Full-text available
This paper presents GALE, the GRAPPLE Adaptive Learning Environment, which (contrary to what the word suggests) is a truly generic and general purpose adaptive hypermedia engine. Five years have passed since "The Design of AHA!" was published at ACM Hypertext (2006). GALE takes the notion of general-purpose a whole lot further. We solve shortcomings of existing adaptive systems in terms of genericity, extensibility and usability and show how GALE improves on the state of the art in all these aspects. We illustrate different authoring styles for GALE, including the use of template pages, and show how adaptation can be defined in a completely decentralized way by using the open corpus adaptation facility of GALE. GALE has been used in a number of adaptive hypermedia workshops and assignments to test whether authors can actually make use of the extensive functionality that GALE offers. Adaptation has been added to wiki sites, existing material e.g. from w3schools, and of course also to locally authored hypertext. Soon GALE will be used in cross-course adaptation at the TU/e in a pilot project to improve the success rate of university students.
Article
In recent years, with the rapid development of camera technology and portable devices, we have witnessed a flourish of user generated videos, which are gradually reshaping the traditional professional video oriented media market. The volume of user generated videos in repositories is increasing at a rapid rate. In today's video retrieval systems, a simple query will return many videos which seriously increase the viewing burden. To manage these video retrievals and provide viewers with an efficient way to browse, we introduce a system to automatically generate a summarization from multiple user generated videos and present their salience to viewers in an enjoyable manner. Among multiple consumer videos, we find their qualities to be highly diverse due to various factors such as a photographer's experience or environmental conditions at the time of capture. Such quality inspires us to include a video quality evaluation component into the video summarization since videos with poor qualities can seriously degrade the viewing experience. We first propose a probabilistic model to evaluate the aesthetic quality of each user generated video. This model compares the rich aesthetics information from several well-known photo databases with generic unlabeled consumer videos, under a human perception component indicating the correlation between a video and its constituting frames. Subjective studies were carried out with the results indicating that our method is reliable. Then a novel graph-based formulation is proposed for the multi-video summarization task. Desirable summarization criteria is incorporated as the graph attributes and the problem is solved through a dynamic programming framework. Comparisons with several state-of-the-art methods demonstrate that our algorithm performs better than other methods in generating a skimming video in preserving the essential scenes from the original multiple input videos, with smooth transitions among consecutive segments and appealing aesthetics overall.
Article
The Resource Description Framework (RDF) is a Semantic Web standard that provides a data language, simply called RDF, as well as a lightweight ontology language, called RDF Schema. We investigate embeddings of RDF in logic and show how standard logic programming and description logic technology can be used for reasoning with RDF. We subsequently consider extensions of RDF with datatype support, considering D entailment, defined in the RDF semantics specification, and D* entailment, a semantic weakening of D entailment, introduced by ter Horst. We use the embeddings and properties of the logics to establish novel upper bounds for the complexity of deciding entailment. We subsequently establish two novel lower bounds, establishing that RDFS entailment is PTime-complete and that simple-D entailment is coNP-hard, when considering arbitrary datatypes, both in the size of the entailing graph. The results indicate that RDFS may not be as lightweight as one may expect.
Article
Due to the scarcity of user interest information in the target domain, recommender systems generally suffer from the sparsity problem. To alleviate this limitation, one natural way is to transfer user interests in other domains to the target domain. However, objects in different domains may be in different media types, which make it very difficult to find the correlations between them. In this paper, we propose a Bayesian hierarchical approach based on Latent Dirichlet Allocation (LDA) to transfer user interests cross domains or media. We model documents (corresponding to media objects) from different domains and user interests in a common topic space, and learn topic distributions for documents and user interests together. Specifically, to learn the model, we combine multi-type media information: media descriptions, user-generated text data and ratings. With this model, recommendation can be done in multiple ways, via predicting ratings, comparing topic distributions of documents and user interests directly and so on. Experiments on two real world datasets demonstrate that our proposed method is effective in addressing the sparsity problem by transferring user interests cross domains.
Article
This paper presents an approach for event detection and annotation of broadcast soccer video. It benefits from the fact that occurrence of some audiovisual features demonstrates remarkable patterns for detection of semantic events. However, the goal of this paper is to propose a flexible system that can be able to be used with minimum reliance on predefined sequences of features and domain knowledge derivative structures. To achieve this goal, we design a fuzzy rule-based reasoning system as a classifier which adopts statistical information from a set of audiovisual features as its crisp input values and produces semantic concepts corresponding to the occurred events. A set of tuples is created by discretization and fuzzification of continuous feature vectors derived from the training data. We extract the hidden knowledge among the tuples and correlation between the features and related events by constructing a decision tree (DT). A set of fuzzy rules is generated by traversing each path from root toward leaf nodes of constructed DT. These rules are inserted in fuzzy rule base of designed fuzzy system and employed by fuzzy inference engine to perform decision-making process and predict the occurred events in input video. Experimental results conducted on a large set of broadcast soccer videos demonstrate the effectiveness of the proposed approach.
Article
The user experience is an integral component of interactive information retrieval (IIR). However, there is a twofold problem in its measurement. Firstly, while many IIR studies have relied on a single dimension of user feedback, that of satisfaction, experience is a much more complex concept. IIR in general, and exploratory search more specifically, are dynamic, multifaceted experiences that evoke pragmatic and hedonic needs, expectations, and outcomes that are not adequately captured by user satisfaction. Secondly, questionnaires, which are typically the means in which user’s attitudes and perceptions are measured, are not typically subjected to rigorous reliability and validity testing. To address these issues, we administered the multidimensional User Engagement Scale (UES) in an exploratory search environment to assess users’ perceptions of the Perceived Usability (PUs), Aesthetics (AE), Novelty (NO), Felt Involvement (FI), Focused Attention (FA), and Endurability (EN) aspects of the experience. In a typical laboratory-style study, 381 participants performed three relatively complex search tasks using a novel search interface, and responded to the UES immediately upon completion. We used Principal Axis Factor Analysis and Multiple Regression to examine the factor structure of UES items and the relationships amongst factors. Results showed that three of the six sub-scales (PUs, AE, FA) were stable, while NO, FI and EN merged to form a single factor. We discuss recommendations for revising and validating the UES in light of these findings.
Conference Paper
Due to the advances in display technologies and the commercial success of 3D motion pictures in recent years, there is renewed interest in enabling consumers to create 3D content. While new 3D content can be created using more advanced capture devices (i.e., stereo cameras), most people still use 2D capture devices. Furthermore, enormously large collections of captured media exist only in 2D. We present a system for producing stereo images from captured 2D videos. Our system detects "good" stereo frames from a 2D video, which was captured a priori without any constraints on camera motion or content. We use a trained classifier to detect pairs of video frames that are suitable for constructing stereo images. In particular, for a given frame It at time t, we determine if t̂ exists such that It+t̂ and It can form an acceptable stereo image. We verify the performance of our method for producing stereo media from captured 2D videos in a psychovisual evaluation using both professional movie clips and amateur home videos. To the best of our knowledge, detecting good stereo pairs from a captured 2D video has been adequately addressed in the literature.
Article
Content aware video manipulation (CAVM) is a method for the analysis and recomposition of video footage, by means of content analysis and adaptive video warping.One main motivation of CAVM is “video retargeting”, a process that visually alters an existing video while considering the relative importance of its various regions. CAVM video retargeting aims at preserving the viewers’ experience by maintaining the information content of important regions in the frame, while altering the video dimensions. Other applications include commercial real-estate allocations, time and space content summary, and content deletion (in both time and spatial domain).In this paper we introduce an efficient algorithm for the implementation of CAVM. It consists of two stages. First, the video is analyzed to detect the importance of each pixel in the frame, based on local saliency, motion detection and object detectors. Then, a transformation manipulates the video content according to the aforementioned analysis and application dependent constraints. The visual performance of the proposed algorithm is demonstrated on a variety of video sequences, and compared to the state-of-the-art in image retargeting.
Video classification and retrieval using arabic closed caption
  • A Anwar
  • G I Salama
  • M B Abdelhalim
Anwar, A., Salama, G.I., Abdelhalim, M.B.: Video Classification And Retrieval Using Arabic Closed Caption. ICIT 2013 The 6th International Conference on Information Technology VIDEO (2013).
Video adaptation for the creation of advanced intelligent content for conferences
  • A Nautiyal
  • E Kenny
  • K Dawson-Howe
Nautiyal, A., Kenny, E., Dawson-Howe, K.: Video adaptation for the creation of advanced intelligent content for conferences. Irish Machine Vision and Image Processing Conference. (2014)
Examining the generalizability of the User Engagement Scale (UES) in exploratory search
  • H L Obrien
  • E G Toms
  • HL OBrien
OBrien, H.L., Toms, E.G.: Examining the generalizability of the User Engagement Scale (UES) in exploratory search. Inf. Process. Manag. 49, 10921107 (2013).
Towards a science of user engagement
  • S Attfield
  • B Piwowarski
  • G Kazai
Attfield, S., Piwowarski, B., Kazai, G.:Towards a science of user engagement. WSDM Workshop on User Modelling for Web Applications, Hong Kong (2011).
AMASE: A framework for supporting personalised activity-based learning on the web
  • A Staikopoulos
  • I Okeeffe
  • R Rafter
  • E Walsh
  • B Yousuf
  • O Conlan
  • V Wade
Staikopoulos, A., OKeeffe, I., Rafter, R., Walsh, E., Yousuf, B., Conlan, O., Wade, V.: AMASE: A framework for supporting personalised activity-based learning on the web. Comput. Sci. Inf. Syst. 11, 343367 (2014).