Lab

Text-Technology Lab (TTLab)


About the lab

The TTLab (Text Technology Lab), headed by Prof. Alexander Mehler, is part of the Department of Computer Science and Mathematics (Fachbereich Informatik und Mathematik) at the Goethe Universität in Frankfurt. It investigates formal, algorithmic models to deepen our understanding of information processing in the humanities. We examine diachronic, time-dependent as well as synchronic aspects of processing linguistic and non-linguistic, multimodal signs. The Lab works across several disciplines to bridge between computer science on the one hand and corpus-based research in the humanities on the other. To this end, we develop information models and algorithms for the analysis of texts, images, and other objects relevant to research in the humanities.

Featured research (18)

The annotation and exploration of large text corpora, both automatic and manual, presents significant challenges across multiple disciplines, including linguistics, digital humanities, biology, and legal science. These challenges are exacerbated by the heterogeneity of processing methods, which complicates corpus visualization, interaction, and integration. To address these issues, we introduce the Unified Corpus Explorer (UCE), a standardized, dockerized, open-source and dynamic Natural Language Processing (NLP) application designed for flexible and scalable corpus navigation. Herein, UCE utilizes the UIMA format for NLP annotations as a standardized input, constructing interfaces and features around those annotations while dynamically adapting to the corpora and their extracted annotations. We evaluate UCE based on a user study and demonstrate its versatility as a corpus explorer based on generative AI.
Processing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To overcome the technological bottleneck involved, we have developed Docker Unified UIMA Interface, a system for the standardized, parallel, platform-independent, distributed and microservices-based solution for processing large and extensive text corpora with any NLP method. We present DUUI as a framework that enables automated orchestration of GPU-based NLP processes beyond the existing Docker Swarm cluster variant, and in addition to the adaptation to new runtime environments such as Kubernetes. Therefore, a new driver for DUUI is introduced, which enables the lightweight orchestration of DUUI processes within a Kubernetes environment in a scalable setup. In this way, the paper opens up novel text-technological perspectives for existing practices in disciplines that deal with the scientific analysis of large amounts of data based on NLP.
We introduce a retrieval approach leveraging Support Vector Regression (SVR) ensembles, bootstrap aggregation (bagging), and embedding spaces on the German Dataset for Legal Information Retrieval (GerDaLIR). By conceptualizing the retrieval task in terms of multiple binary needle-in-a-haystack subtasks, we show improved recall over the baselines (0.849 > 0.803 | 0.829) using our voting ensemble, suggesting promising initial results, without training or fine-tuning any deep learning models. Our approach holds potential for further enhancement, particularly through refining the encoding models and optimizing hyperparameters.
We present Viki LibraRy, a dynamically built library in virtual reality (VR) designed to visualise hypertext systems, with an emphasis on collaborative interaction and spatial immersion. Viki LibraRy goes beyond traditional methods of text distribution by providing a platform where users can share, process, and engage with textual information. It operates at the interface of VR, collaborative learning and spatial data processing to make reading tangible and memorable in a spatially mediated way. The article describes the building blocks of Viki LibraRy, its underlying architecture, and several use cases. It evaluates Viki LibraRy in comparison to a conventional web interface for text retrieval and reading. The article shows that Viki LibraRy provides users with spatial references for structuring their recall, so that they can better remember consulted texts and their meta-information (e.g. in terms of subject areas and content categories).

Lab head

Alexander Mehler
Department
  • Institut für Informatik

Members (17)

Andy Lücking
  • Goethe University Frankfurt
Giuseppe Abrami
  • Goethe University Frankfurt
Rüdiger Gleim
  • Goethe University Frankfurt
Wahed Hemati
  • Goethe University Frankfurt
Gerwin Kasperek
  • Goethe University Frankfurt
Daniel Baumartz
  • Goethe University Frankfurt
Alexander Henlein
  • Goethe University Frankfurt
Manuel Stoeckel
  • Goethe University Frankfurt
Andy Lücking
Andy Lücking
  • Not confirmed yet
Mark Klement
Mark Klement
  • Not confirmed yet
Dominik Mattern
Dominik Mattern
  • Not confirmed yet