Conference PaperPDF Available

Unlocking Cultural Conceptualisation in Indigenous Language Resources: Collaborative Computing Methodologies

Authors:

Abstract

The world's indigenous languages and related cultural knowledge are under considerable threat of diminishing given the increasing expansion of the use of standard languages, particularly through the wide-ranging pervasion of digital media and machine readable editions of electronic resources. There is thus a pressing need to preserve and breathe life into traditional data resources containing both valuable linguistic and cultural knowledge. In this paper we demonstrate on the example of an Austrian non-standard language resource (DBÖ/dbo@ema), how the combined application of semantic modelling of cultural concepts and visual exploration tools are key in unlocking the indigenous knowledge system, traditional world views and valuable cultural content contained within this rich resource. The original data collection questionnaires serve as a pilot case study and initial access point to the entire collection. Set within a Digital Humanities context, the collaborative methodological approach described here acts as a demonstrator for opening up traditional/non-standard language resources for cultural content exploration through computing, ultimately giving access to, re-circulating and preserving otherwise lost immaterial cultural heritage.
A preview of the PDF is not available
... project (exploring Austria's culture through the language glass; cf. Wandl-Vogt et al., 2015;Dorn et al., 2018) is an example of such a complex DH project that brings together different disciplines and actor groups. It revolves around a large (3.5 million entries) and rich digitized German non-standard language collection, from the time of the former Austro-Hungarian monarchy (Database of Bavarian Dialects in Austria, [DBÖ ] (1993-);Wandl-Vogt, 2008). ...
... While previous project-centred papers (e.g. Dorn et al., 2018) have shed light on specific aspects, such as linguistic, collaboration, or computational aspects, the novelty and necessity of this article lies in bringing the complexity of interactions in exploreAT! to the foreground, which addresses the humanities and other disciplines, methods, and results. ...
Article
Full-text available
This article provides insights into dealing with complexities in the Digital Humanities project exploreAT!. By exploring a non-standard language collection for cultural insights, a three-fold approach is presented looking into concrete realizations and solutions of tackling challenges in terms of Open Innovation infrastructure, technology and the topic of choice, food. Methods and processes applied and developed in the project are aimed to serve as examples for future projects with similar data sets.
... project exploreAT! is a current DH project which aims to unveil cultural information contained in a non-standard language resource (DBÖ) [Database of Bavarian dialects in Austria; [3]] by drawing on and combining digital methods and tools from different disciplines (semantic technologies, visualisation prototyping, crowd science) (cf. [5]). At the heart of the project lies the fundamental research question originating from the Humanities background, which asks how to enable access to a non-standard language resource through a cultural lens, giving insights on the conceptualisation of the world and the local society at the time. ...
Article
Full-text available
Understanding collaboration between researchers of different disciplines requires an ability to embrace multiple views and perspectives, and communicative efforts. This paper thus provides insights on methods, processes and results of a cooperation in Humanities research supported by semantic technologies with the aim of accessing and opening up cultural knowledge contained in a non-standard language resource. The collaborative undertaking is carried out within a Digital Humanities project and an Open Innovation framework. Meta-disciplinary learnings offer insights on factors fostering mutual understanding, knowledge translation and mutual benefits.
... Beyond inherent biases in word embeddings, further study could involve surveying societal perceptions of gender associations with specific occupations. Developing Amharic lexical resources with gender annotations using semantic web techniques, like the Ontolex model [1,8,14], could enhance the overall understanding of cultural biases and strengthen our analysis. ...
Chapter
Bias in natural language processing systems can perpetuate and exacerbate societal inequalities, reflecting and potentially amplifying existing biases in human language and culture. Amharic, as the official language of Ethiopia, holds cultural and linguistic significance, making it imperative to assess potential biases within its computational representations. This research paper investigates the presence and extent of gender bias in Amharic text corpora. The research utilizes gendered word pairs to capture gender representation in the word embeddings and quantifies the degrees of gender bias present in profession words. We found that profession words carried stereotypical implicit biases with most occupations leaning towards male. Profession words like “nurse” and “house-maid” align with societal gender dynamics, displaying significant female associations. Additionally, professions in the arts and athleticism demonstrate a robust female-leaning bias, while physically demanding and educated professional roles tend to exhibit male-leaning biases. The study contributes insights into the gender dynamics encoded within the Amharic language informing strategies to reduce bias and fostering fair and unbiased representations for improved societal and technological outcomes.
... However, the selection and deployment of the repositories are out of the scope of this research paper. Further reading on the topic is available in [40][41][42][43]. ...
Article
Full-text available
Cultural heritage images are among the primary media for communicating and preserving the cultural values of a society. The images represent concrete and abstract content and symbolise the social, economic, political, and cultural values of the society. However, an enormous amount of such values embedded in the images is left unexploited partly due to the absence of methodological and technical solutions to capture, represent, and exploit the latent information. With the emergence of new technologies and availability of cultural heritage images in digital formats, the methodology followed to semantically enrich and utilise such resources become a vital factor in supporting users need. This paper presents a methodology proposed to unearth the cultural information communicated via cultural digital images by applying Artificial Intelligence (AI) technologies (such as Computer Vision (CV) and semantic web technologies). To this end, the paper presents a methodology that enables efficient analysis and enrichment of a large collection of cultural images covering all the major phases and tasks. The proposed method is applied and tested using a case study on cultural image collections from the Europeana platform. The paper further presents the analysis of the case study, the challenges, the lessons learned, and promising future research areas on the topic.
... In its collaborative setting, the project combines expertise from project partners in semantic technologies (ADAPT Centre, DCU, Ireland) [1], visual prototyping (VisUSAL, Universidad de Salamanca, Spain) [6], and Crowd Science. The project is based around a digitized non-standard language resource of the Bavarian Dialects in Austria (DBÖ/dbo@ema) [46], which captures the language and through it also the culture of the local society in the area of the former Austro-Hungarian empire from the early 20th century until now [22]. Within the rich content of a diversity of cultural topics, exploreAT! ...
Article
Full-text available
This article reports on the experience of co-designing an educational video game aimed at promoting good dietary habits in youngsters and fostering Sustainable Development Goals (SDGs), such as SDG 3 (Good Health and Well-Being), SDG 10 (Reduced Inequalities), and SDG 17 (Partnerships for the Goals). To ensure the quality of the results, we developed a methodology under a social innovation paradigm that enabled the co-creation of the game. The methodology was driven by a series of three workshops, during which we adopted several different gamification strategies to support a Participatory Design (PD) process with the stakeholders, a group of local pre-teen and teen girls at social risk (N = 22). Captured requirements materialized into intermediate prototype evaluations that motivated a progressive refinement of the game.
... was implemented in 2015 as a cross-disciplinary project at the Austrian Centre for Digital Humanities (ACDH-OeAW), the Austrian Academy of Sciences. It brings together expertise from different disciplines and partners in the fields of cultural lexicography and Open Innovation (OI) (ACDH-OeAW, Austria), semantic technologies (ADAPT Centre, DCU, Ireland), and human-machine interaction via visualization (VisUSAL, Universidad de Salamanca, Spain) (see Abgaz, Dorn, Piringer, Wandl-Vogt, & Way, 2018a, 2018bBenito et al., 2016;Benito, Losada, Therón, Dorn, & Wandl-Vogt, 2018;Dorn, Wandl-Vogt, Abgaz, Benito Santos, & Therón, 2018). ...
... A medida que el volumen de datos aumenta de tamaño, también lo hace la complejidad del problema y la necesidad de herramientas adecuadas que aprovechen la carga cognitiva involucrada. En este proyecto, se ha diseñado un prototipo de software visual interactivo (Benito-Santos et al., 2018) capaz de ayudar a los investigadores y analistas de rendimiento deportivo para estudiar el comportamiento colectivo grupal en los partidos y entrenamientos de fútbol. ...
Article
Full-text available
Resumen: El GRupo de Investigación en InterAcción y eLearning (GRIAL) es un Grupo de Investigación Reconocido (GIR) de la Universidad de Salamanca y, actualmente, Unidad de Investigación Consolidada (UIC) de la Junta de Castilla y León. Su mayor seña de identidad es que es un grupo de investigación multidisciplinar que surge en torno a la creación y aplicación de tecnología educativa, por tanto, en su composición integra fundamentalmente ingenieros en informática y pedagogos, pero en él se incluyen humanistas, bibliotecólogos, filósofos o filólogos entre otros perfiles. En este artículo se presentan las líneas de investigación del grupo que actualmente son más activas, avaladas por sus proyectos y sus principales publicaciones. Palabras clave: GRIAL, Analítica visual, Calidad y evaluación en educación, Ecosistemas tecnológicos, Humanidades digitales, Responsabilidad social e inclusión, Tecnologías del aprendizaje, eLearning, Sistemas interactivos. Abstract: The research GRoup in InterAction and eLearning (GRIAL) is a Recognized Research Group of the University of Salamanca and, currently, a Consolidated Research Unit by the Regional Council of Castile and León. Its most prominent defining characteristic is that it is a multidisciplinary research group which arise around the creation and application of educational technology; therefore, it is composed fundamentally by computer engineers and educationalists, but it also includes humanists, biotechnologists, philosophers, philologists among other professional profiles. This article presents the research lines of the group that are currently most active, supported by their projects and their leading publications.
... After observing the situation depicted in the graph, it can be seen that a small increase in the accepted variation margin (20%) resulted in a substantial increase (133%) in the number of connections between the two central nodes, which went from four to seven common nodes. In light of the results, and following a common step in Humanities research [6], the user could opt at this point for consulting the original sources in an attempt to find additional information that further supported the initial hypothesis. ...
Conference Paper
Full-text available
As current research shows, the humanities and visualization communities are greatly benefitting from each other thanks to the interdisciplinary idiosyncrasy of a research field, the Digital Humanities (DH) that has gained significant attention in recent years from a vast number of scholars with disparate backgrounds. The DH suppose a vibrant field of experimentation for the application of innovative visualization design techniques and other related computational methods. A new wave in visualization research, uncertainty visualization, focuses on the display and conveyance of the uncertainty present in a great variety of statistical and algorithmic methods driving computations in a wide range of scientific domains. In this paper, we present a visualization dashboard concept in which we introduced uncertainty induced by a simple linguistic algorithm that is commonly employed in humanities research. According to this new paradigm, we showcase how accessible pre-existing visualizations, such as 2D histograms, can be easily adapted to communicate inherent algorithmic variability to the user, motivating uncertainty-aware research analyses in a DH context.
Article
Full-text available
Different types of uncertainties occur in almost all datasets and are an inherent property of data across different academic disciplines, including digital humanities (DH). In this paper, we address, demonstrate and analyse spatio-temporal uncertainties in a non-standard German legacy dataset in a DH context. Although the data collection is primarily a linguistic resource, it contains a wealth of additional, comprehensive information, such as location and temporal detail. The addressed uncertainties have manifested because of a variety of reasons, and partly also because of decades of data transformation processes. We here propose our own taxonomy for capturing and classifying the various uncertainties, and show with numerous examples how the remedying but also re-introduction of uncertainties affects DH practices.
Conference Paper
Full-text available
The exploreAT! project aims to give insights into the richness of the German language in the Austrian area through a rich and unique collection of dialect words of the Bavarian dialects recorded during the former Austrian-Hungariann Monarchy period and beyond. Originally collected by means of questionnaires, words were noted in handwriting on individual paper slips, covering topics from nature and food to religious festivities, etc. Once digitized, the full database contains around 3.5 million single data entries with an estimated 200,000 headwords, which requires substantial effort if the analysts want to access specific information from the data set. It should also be noted that the data presents a high heterogeneity in terms of its nature and origin (from questionnaires, collectors, scientists, spoken language, hand written notes, etc.), which calls for the creation of a homogeneous database containing all of the available information. In this paper we present a tool aimed to improve the comprehension of that massive amount of data through visualization means, thus trying to help in the reach of meaningful conclusions and the acquisition of valuable insights in easy and fast ways. With it, analysts can discover cultural issues and access them through means of language and visualization. This is possible thanks to a multidimensional approach to data analysis based on the use of maps, projections and other visualization artifacts. To reach our goal, a team of experts with different backgrounds worked together trying to close the gap between the Humanities and Computer Sciences fields through the creation of our prototype and its multiple iterations.
Conference Paper
Collections of linguistic and dialect data often lack a semantic description and the ability to establish relations to external datasets, from e.g. demography, socio-economics or geography. Based on existing projects-the Database of Bava-rian Dialects in Austria and exploreAT!-this paper elaborates on a spatio-temporal Linked Data model for representing linguistic/dialect data. Here we focus on utilizing existing data and publishing them using a virtual RDF graph. Additionally, we exploit external datasources like DBPedia and geonames.org, to specify the meaning of dialect records and make use of stable geographical placenames. In the paper we highlight a spatio-temporal modeling and representation of linguistic records relying on the notion of a discrete lifespan of an object. Based on a real-world example-using the lemma " Karotte " (engl. carrot) we show how the usage of a specific dialect word (" Karottn ") changes from 1916 until 2016-by exploiting the expressive power of GeoSPARQL.
Linguistic Linked Open Data (LLOD)
  • C Chiarcos
  • P Cimiano
  • T Declerck
  • J P Mccrae
Chiarcos, C., Cimiano, P., Declerck, T. & McCrae, J. P. (2013). Linguistic Linked Open Data (LLOD). Introduction and Overview. In C. Chiarcos, P. Cimiano, T. Declerck & J. P. McCrae (Eds.), 2nd Workshop on Linked Data in Linguistics. Representing and Linking Lexicons, Terminologies and Other Language Data.
dboe@TEI: remodelling a database of dialects into a rich LOD resource
  • D Schopper
  • J Bowers
  • E Wandl-Vogt
Schopper, D., Bowers, J. & Wandl-Vogt, E. (2015). dboe@TEI: remodelling a database of dialects into a rich LOD resource. Retrieved January 17, 2018 from Text Encoding Initiative. Conference and members' meeting 2015. October 28-31, Lyon, France. Papers: http://tei2015.huma-num.fr/en/papers/#146
Universal Declaration on Cultural Diversity: a vision, a conceptual platform, a pool of ideas for implementation
UNESCO (2002) Universal Declaration on Cultural Diversity: a vision, a conceptual platform, a pool of ideas for implementation, a new paradigm. Cultural Diversity series, Vol.1 http://unesdoc.unesco.org/images /0012/001271/127162e.pdf [last access: 19.01.2018]
Datenbank der bairischen Mundarten in Österreich electronically mapped [Database of the Bavarian Dialects in Austria electronically mapped] (dbo@ema)
  • E Wandl-Vogt
Wandl-Vogt, E. (2010; Ed.). Datenbank der bairischen Mundarten in Österreich electronically mapped [Database of the Bavarian Dialects in Austria electronically mapped] (dbo@ema). Wien. [Processing status: 2018.01.] https://wboe.oeaw.ac.at/dboe/indices/
exploreAT! Perspektiven einer Transformation am Beispiel eines lexikographischen Jahrhundertprojekts
  • E Wandl-Vogt
  • B Kieslinger
  • A O'connor
  • R Theron
UNESCO (2002) Universal Declaration on Cultural Diversity: a vision, a conceptual platform, a pool of ideas for implementation, a new paradigm. Cultural Diversity series, Vol.1 http://unesdoc.unesco.org/images /0012/001271/127162e.pdf [last access: 19.01.2018] Wandl-Vogt, E., Kieslinger, B., O'Connor, A. & Theron, R. (2015). exploreAT! Perspektiven einer Transformation am Beispiel eines lexikographischen Jahrhundertprojekts. In DHd2015. Von Daten zu Erkenntnissen. 23. bis 27. Februar 2015, Graz. Book of Abstracts. Wandl-Vogt, E. (2008): Wie man ein Jahrhundertprojekt zeitgemäß hält: Datenbankgestützte Dialektlexikografie am Institut für Österreichische Dialekt-und Namenlexika (I Dinamlex) (mit 10 Abbildungen). In: Ernst, Peter (Eds.):