Conference PaperPDF Available

The CULTURA Evaluation Model: An Approach Responding to the Evaluation Needs of an Innovative Research Environment

Authors:

Abstract and Figures

This paper presents the evaluation approach taken for an innovative research environment for digital cultural heritage collections in the CULTURA project. The integration of novel services of information retrieval to support exploration and (re)search of digital artefacts in this research environment, as well as the intended corpus agnosticism and diversity of target users posed additional challenges to evaluation. Starting from a methodology for evaluating digital libraries an evaluation model was established that captures the qualities specific to the objectives of the CULTURA environment, and that builds a common ground for empirical evaluations. A case study illustrates how the model was translated into a concrete evaluation procedure. The obtained outcomes indicate a positive user perception of the CULTURA environment and provide valuable information for further development.
Content may be subject to copyright.
A preview of the PDF is not available
... For the purpose of RAGE the CULTURA evaluation model (Steiner et al., 2013(Steiner et al., , 2014) is adopted and further elaborated to form the conceptual basis for the Ecosystem evaluation. The evaluation questions presented in Table 2 reflect the topics of this existing evaluation model and can be mapped to the interaction axes (see Figure 4 for an illustration). ...
... The evaluation questions presented in Table 2 reflect the topics of this existing evaluation model and can be mapped to the interaction axes (see Figure 4 for an illustration). Usability (E1) and user acceptance (E2) are located on the axis between system and user (in line with Steiner et al., 2013Steiner et al., , 2014. Resource quality (E3) matches with the concept of usefulness from the original Triptych model (Tsakonas et al., 2004) on the content-user axis, thus referring to the usefulness of the resources/content for users. ...
Article
Full-text available
This document presents the RAGE evaluation methodology.It providesthe frameworkand accompanying guidelines for the evaluation and validation of the quality and effectiveness of the project outputs. Formative and summative evaluationsof the different RAGE technologies and their underlying methodologies–the assets, the Ecosystem, and the applied games–will be carried out on the basis of this common framework. Evaluation in RAGE addresses applied gaming in a holistic manner, taking into account the interests of different stakeholder groups. This means targetingthe effectiveness of the developed applied games, as well asevaluating the process of developing them and the benefit that the RAGE technologies affordthis process. The evaluation frameworktherefore constitutes a multi-perspective and holistic approach to systematicand comprehensive evaluation of applied gamestechnology. Based on the specificationof the evaluation objectives and questionsfor each strand of evaluation workand a review of existing methods,the evaluation approaches forRAGE areelaborated. This includes the identification of the plannedtypes of evaluation studies, the description of their proposed research designs and envisagedinstruments.An evaluation model integrating the differentevaluation tasksand their perspectives onevaluationis presented.The RAGE evaluation framework forms the basis for the analysis of the educational and cost effectiveness of the RAGE technologies, and for ensuring that they reflect the needs of theirstakeholders. The assets will be evaluated in respect of their software quality and theadded valuethey bring tothe game developmentprocess. In addition, the relevance and benefit that assets provide through their pedagogical functionality shall be evaluated from an educational viewpoint. The Ecosystem will be assessed in terms of users’ reactions to the system, quality of resources provided, the capability of attractingrelevant stakeholders, user contributions,and the added valuefor users. The evaluation ofeffectivenessof the gamesapplying the assets anddeveloped for the application scenarios will cover theanalysis of the games’effect on end-users’ perceptions, learning, skills acquisition and knowledge transfer. Unobtrusive data tracking and sensoring through RAGE assetswill complement more traditional instruments (such as questionnaires), thus providing a wide spectrum of evaluation methods capable of obtaining in-depth insights toplayers’ experiences.Furthermore, explorative and contextual cost-benefit analyses for game development and educational application of applied games will be undertaken foreach of the six use cases.This document also provides evaluation guidelines,whichserve as a manual forimplementingthe evaluation approachesinthe empirical work with users. In particular, theseguidelinesfocus onrelevant aspects of research data management to be taken into account whenconducting evaluation studies. General proceduresto be followedand tools and tips are described on how to implement requirements in terms of ethics, privacy, and open access in collecting, managing and sharing data in evaluation studies.The RAGE evaluation framework and guidelines arethe foundation for planning and organising scientifically sound, iterative,and mixed-method evaluations onthe project outcomes and thus, serveas the common reference point for all evaluation studies carried out in the project. The evaluation approachesoutlinedin this document will be translated into concrete evaluation procedures and materials for theindividual studies in thetwo main evaluation phases.The methods and outcomes of theseformative and summativeevaluation studies will be reported in detail in the RAGE evaluation reports(D8.3 in M33 and D8.4 in M47).The evaluation framework–in addition to itsuse in RAGE evaluation –aim atcontributing to evaluation methodologies in the context of applied and serious games, on a wider scope, and may be (re-)used in and adapted for other applied gaming projects. In this way, theevaluation approaches, outcomes and experiences of RAGE may serve as best practice for future applied game evaluations.
... A similar approach has been applied on the digital library system CULTURA that serves as an adaptive information system for historians. Relevant qualities have been defined and evaluated individually including usability, user acceptance, adaptation quality, visualization quality, and content usefulness (Steiner et al., 2013). Though applied in digital libraries, these aspects are also relevant in adaptive learning systems. ...
Conference Paper
Full-text available
This paper presents a service that supports the evaluation of adaptive learning systems. This evaluation service has been developed for and tested with an adaptive digital libraries system created in the CUTLURA project. Based on these experiences an approach is outlined, how it can be used in a similar way to evaluate the features and aspects of adaptive learning systems. Following the layered evaluation approach, qualities are defined that are evaluated individually. The evaluation service supports the whole evaluation process, which includes modelling the qualities to be evaluated, data collection, and automatic reports based on data analysis. Multi-modal data collection facilitates continuous and non-continuous, as well as invasive and non-invasive evaluation.
Article
Full-text available
Digital humanities initiatives play an important role in making cultural heritage collections accessible to the global community of researchers and general public for the first time. Further work is needed to provide useful and usable tools to support users in working with those digital contents in virtual environments. The CULTURA project has developed a corpus agnostic research environment integrating innovative services that guide, assist and empower a broad spectrum of users in their interaction with cultural artefacts. This article presents (1) the CULTURA system and services and the two collections that have been used for testing and deploying the digital humanities research environment, and (2) an evaluation methodology and formative evaluation study with apprentice researchers. An evaluation model was developed which has served as a common ground for systematic evaluations of the CULTURA environment with user communities around the two test bed collections. The evaluation method has proven to be suitable for accommodating different evaluation strategies and allows meaningful consolidation of evaluation results. The evaluation outcomes indicate a positive perception of CULTURA. A range of useful suggestions for future improvement has been collected and fed back into the development of the next release of the research environment.
Conference Paper
Full-text available
In recent years there has been a marked uptake in the digitisation of cultural heritage collections. Though this has enabled more sources to be made available to experts and the wider public, curators still struggle to instigate and enhance engagement with cultural archives. This is largely due to the monolithic nature of many digital archives; the challenge of understanding large collections, especially if the language is inconsistent; and because users vary in expertise and have different tasks and goals that they are trying to accomplish. This paper describes CULTURA, an FP7 funded project that is addressing these specific issues. The various technologies and approaches being used by CULTURA are discussed, along with the lessons learnt thus far.
Article
Full-text available
In this paper we describe an entity oriented search and exploration system that we are developing for the EU Cultura project.
Article
Full-text available
This paper suggests an alternative to the traditional 'as a whole' approach of evaluating adaptive learning systems (ALS), and adaptive systems, in general. We argue that the commonly recognised models of adaptive systems can be used as a basis for a layered evaluation that offers certain benefits to the developers of ALS. Therefore, we propose the layered evaluation framework, where the success of adaptation is addressed at two distinct layers: • user modelling • adaptation decision making. We outline how layered evaluation can improve the current evaluation practice of ALS. To build a stronger case for a layered evaluation we re-visit the evaluation of the InterBook where the layered approach can provide a difference and provide an example of its use in KOD learning system.
Article
Full-text available
Usability does not exist in any absolute sense; it can only be defined with reference to particular contexts. This, in turn, means that there are no absolute measures of usability, since, if the usability of an artefact is defined by the context in which that artefact is used, measures of usability must of necessity be defined by that context too. Despite this, there is a need for broad general measures which can be used to compare usability across a range of contexts. In addition, there is a need for "quick and dirty" methods to allow low cost assessments of usability in industrial systems evaluation. This chapter describes the System Usability Scale (SUS) a reliable, low-cost usability scale that can be used for global assessments of systems usability.
Chapter
Adaptive learning systems (ALE) are designed towards the main objective of tailoring learning content, system feedback and appearance to individual users - according to their preferences, goals, knowledge, and other characteristics. The implementation of a successful adaptation process is of course demanding. Adaptation is not good per se and poor realisations of adaptation may lead to disappointed users who may reject or disable adaptation mechanisms. This is why the evaluation of ALE needs to be a fundamental and integral part of their development. Evaluation should address the questions whether adaptation works on principle, whether it really improves the system, whether it leads to more effective learning, whether users prefer the adaptive features, etc. The main challenge in evaluating ALE lies in their core characteristic - adaptivity, which results in individual experiences and interactions with the system for each individual user. The attempts of dealing with this challenge are diverse. This chapter provides an overview on existing and suggested methods for evaluating adaptive e-learning. Strengths and weaknesses of the current evaluation approaches are elaborated and relevant topics in the user-centred evaluation of ALE are discussed. The evaluation methodology developed in GRAPPLE, an EC-funded project aiming at developing a generic responsive adaptive personalised learning environment, is outlined as a case study on evaluating adaptive e-learning. In GRAPPLE the concept of 'adaptation quality' is adopted and conceptualized in terms of covering different aspects of adaptive elearning experiences that are addressed for a holistic evaluation of adaptive e-learning. In sum, the chapter aims at increasing awareness for the importance of careful and properly designed evaluation of ALE. We believe that the thorough consideration of the quality of adaptation and the use of evaluation approaches on the basis of mathematicalpsychological models and expertise are the ingredients for a sound investigation of the benefits of adaptive e-learning and thus, for contributing to the overall notice, growth, and spread of ALE.
Article
Improved full-text search, named-entity recognition and relationship extraction are all key research topics across many areas of technology, with emerging applications in the intelligence, healthcare and financial fields amongst many others [1] . In Digital Humanities, there is a growing interest in the application of such Natural Language Processing (NLP) approaches to historical texts [2] with a view to improving how a user can explore and analyse these collections [3] [4] [5] [6] . However, the text contained in handwritten historical manuscript collections can often be 'noisy' in nature — with variation in spelling, punctuation, word form, sentence structure and terminology. This is particularly the case with collections written in archaic language forms, such as Early-Modern English. Multiple studies have concluded that the applicability of modern NLP tooling to such historical texts has been very limited due to this inherent noisiness in the texts. This historical language barrier hinders the accessibility and thus the potential exploration and analysis of many significant historical text collections. This paper will discuss the normalisation of historical texts as a solution to this problem and examine how normalisation can improve the analysis, interpretation and exploration of these collections. Normalisation is the process of transforming text into a single canonical form, in this case, the modern equivalent of the language. Once this has been completed, the texts can be processed using current NLP techniques and technologies. However, the normalisation of historical texts presents a difficult challenge in itself. Much research has been undertaken in an attempt to cope with the correction and normalisation of text produced by Optical Character Recognition (OCR), speech recognition, instant messaging etc. which show similar characteristics to those of historical texts. One technique which has been applied is the use of a historical lexicon, supplemented by computational tools and linguistic models of variation. However, because of the absence of language standards, multiple orthographic variations of a given word or expression can be found in a collection of material, even in the same document. As a result, the quality of the results achieved, even after normalisation, has not been satisfactory. Researchers have also noted a general lack of tools and resources specialised to this domain. This paper will present the normalisation research conducted as part of the CULTURA project, which has developed techniques for the normalisation of a 17th century manuscript collection written in Early Modern English, The 1641 Depositions [7] . CULTURA analyses the artefacts and through the application of novel linguistic models of variation, enables normalisation techniques to remove issues of inconsistency in spelling, grammar and punctuation. The technologies developed and applied have had to solve issues arising from the need to contend with noisy inputs, the impact noise can have on downstream applications, and the demands that noisy information places on document analysis. The normalisation of texts in Early Modern English can be interpreted as a special (restricted) case of translation. Using this intuition, a methodology was developed based upon statistical machine translation models. The key ingredient of this approach is a new translation module that further develops known OCR correction techniques.
Conference Paper
Evaluation is an important task for digital libraries, because it reveals relevant information about their quality. This paper presents a conceptual and technical approach to support the systematic evaluation of digital libraries in three ways and a system is presented that assists during the entire evaluation process. First, it allows for formally modelling the evaluation goals and designing the evaluation process. Second, it allows for data collection in a continuous and non-continuous, invasive and non-invasive way. Third, it automatically creates reports based on the defined evaluation models. On the basis of an example evaluation it is outlined how the evaluation process can be designed and supported with this system.
Article
Aiming at a user-oriented approach in software evaluation on the basis of ISO 9241 Part 10, we present a questionnaire (IsoMetrics) which collects usability data for summative and formative evaluation, and docum ent its construction. The summative version of IsoM etrics shows a high reliability of its subscales and gathers valid information about diOE erences in the usability of diOE erent software systems. Moreover, we show that the formative version of IsoM etrics is a powerful tool for supporting the identi® cation of software weaknesses. Finally, we propose a procedure to categorize and prioritize weak points, which subsequently can be used as basic input to usability reviews.