Manuel Burghardt

Manuel Burghardt
University of Leipzig · Computational Humanities

Professor
Computational Humanities (https://ch.uni-leipzig.de/)

About

97
Publications
41,725
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
257
Citations
Introduction
I am head of the Computational Humanities Group at the Leipzig University (https://ch.uni-leipzig.de/). I am interested in modeling and analyzing all kinds of humanities data, including digitized texts (electronic corpora) as well as born digital texts (social media), images, music, film and digital culture (e.g. video games).
Additional affiliations
November 2014 - present
Universität Regensburg
Position
  • Akademischer Rat auf Zeit
April 2010 - October 2014
Universität Regensburg
Position
  • Wissenschaftlicher Mitarbeiter
April 2009 - March 2010
Universität Regensburg
Position
  • Lehrkraft für besondere Aufgaben

Publications

Publications (97)
Conference Paper
Full-text available
Data acquisition in dialectology is typically a tedious task, as dialect samples of spoken language have to be collected via questionnaires or interviews. In this article, we suggest to use the “web as a corpus” approach for dialectology. We present a case study that demonstrates how authentic language data for the Bavarian dialect (ISO 639-3:bar)...
Article
Full-text available
Nature's non‐material contributions to people are difficult to quantify and one aspect in particular, nature's contributions to communication (NCC), has so far been neglected. Recent advances in automated language processing tools enable us to quantify diversity patterns underlying the distribution of plant and animal taxon labels in creative liter...
Article
Full-text available
We present the Game Walkthrough Corpus (GWTC), which contains 12,295 unique walkthrough documents covering 6,117 games. For each game walkthrough, we provide frequencies of unigrams and bigrams, treating the walkthrough document as a Bag of Words. In addition, we provide word frequencies at the sentence level. Furthermore, the GWTC contains a numbe...
Article
Full-text available
The MuSe (Music Sentiment) dataset contains sentiment information for 90,001 songs. We computed scores for the affective dimensions of valence, dominance, and arousal, based on the user-generated tags that are available for each song via Last.fm. In addition, we provide artist, title and genre metadata, and a MusicBrainz ID and a Spotify ID, which...
Article
Full-text available
Although digital humanities (DH) has received a lot of attention in recent years, its status as “a discipline in its own right” (Schreibman et al., A companion to digital humanities (pp. xxiii–xxvii). Blackwell; 2004) and its position in the overall academic landscape are still being negotiated. While there are countless essays and opinion pieces t...
Article
Books printed before 1800 present major problems for OCR. One of the main obstacles is the lack of diversity of historical fonts in training data. The OCR-D project, consisting of book historians and computer scientists, aims to address this deficiency by focussing on three major issues. Our first target was to create a tool that identifies font gr...
Chapter
In this paper, we discuss the computer-aided processing of handwritten tabular records of historical weather data. The observationes meteorologicae, which are housed by the Regensburg University Library, are one of the oldest collections of weather data in Europe. Starting in 1771, meteorological data was consistently documented in a standardized f...
Preprint
Full-text available
We investigate how to train a high quality optical character recognition (OCR) model for difficult historical typefaces on degraded paper. Through extensive grid searches, we obtain a neural network architecture and a set of optimal data augmentation settings. We discuss the influence of factors such as binarization, input line height, network widt...
Article
Full-text available
Bereits seit einigen Jahren werden große Anstrengungen unternommen, um die im deutschen Sprachraum erschienenen Drucke des 16.-18. Jahrhunderts zu erfassen und zu digitalisieren. Deren Volltexttransformation konzeptionell und technisch vorzubereiten, ist das übergeordnete Ziel des DFG-Projekts OCR-D, das sich mit der Weiterentwicklung von Verfahren...
Preprint
Full-text available
One important and particularly challenging step in the optical character recognition (OCR) of historical documents with complex layouts, such as newspapers, is the separation of text from non-text content (e.g. page borders or illustrations). This step is commonly referred to as page segmentation. While various rule-based algorithms have been propo...
Conference Paper
Full-text available
We present Katharsis, a tool for "computational drametrics" that implements Solomon Marcus' (1973) theory of mathematical drama analysis. The tool computes and visualizes character configurations and speech statistics for different levels of analysis and allows users to compare different collections of plays. We illustrate the usefulness of the too...
Conference Paper
Full-text available
We present first results of an ongoing research project on sentiment annotation of historical plays by German playwright G. E. Lessing (1729-1781). For a subset of speeches from six of his most famous plays, we gathered sentiment annotations by two independent annotators for each play. The annotators were nine students from a Master's program of Ge...
Conference Paper
Full-text available
We present a case study as part of a work-in-progress project about multimodal sentiment analysis on historic German plays, taking Emilia Galotti by G. E. Lessing as our initial use case. We analyze the textual version and an audio version (audiobook). We focus on ready-to-use sentiment analysis methods: For the textual component, we implement a na...
Conference Paper
Full-text available
We present results from a project on sentiment analysis of drama texts, more concretely the plays of Gotthold Ephraim Lessing. We conducted an annotation study to create a gold standard for a systematic evaluation. The gold standard consists of 200 speeches of Lessing's plays and was manually annotated with sentiment information by five annotators....
Conference Paper
Full-text available
We present results of a sentiment annotation study in the context of historical German plays. Our annotation corpus consists of 200 representative speeches from the German playwright Gotthold Ephraim Lessing. Six annotators, five non-experts and one expert in the domain, annotated the speeches according to different sentiment annotation schemes. Th...
Article
Full-text available
Zusammenfassung Der Beitrag beschreibt ein laufendes Projekt zur computergestützten Erschließung und Analyse einer großen Sammlung handschriftlicher Liedblätter mit Volksliedern aus dem deutschsprachigen Raum. Am Beispiel dieses praktischen Projekts werden Chancen und Herausforderungen diskutiert, die der Einsatz von Digital Humanities-Methoden für...
Conference Paper
Full-text available
n diesem Beitrag wird über die Ergebnisse eines laufenden Digital Humanities-Projekt zur Sentiment Analysis in literarischen Texten berichtet und die Implikation von diesem diskutiert. In dem Projekt wer-den verschiedene Methoden der Sentiment Analysis auf Texte historischer Dramen des 18. Jahrhunderts von G. E. Lessing implementiert und gegeneinan...
Conference Paper
Full-text available
In this article we present an exploratory interface for the analysis of movies. Movies are segmented into shots, which are in turn displayed as scalable MovieBarcodes, i.e. film scholars can zoom into the MovieBarcode representation to explore single chapters or scenes. The tool also provides a search function that can be used to filter shots accor...
Conference Paper
In this paper, we describe the challenge of transcribing a large corpus of handwritten music scores. We conducted an evaluation study of three existing optical music recognition (OMR) tools. The evaluation results indicate that OMR approaches do not work well for our corpus of highly heterogeneous, handwritten music scores. For this reason, we desi...
Conference Paper
Full-text available
Dieser Beitrag beschreibt ein laufendes Projekt zur digitalen Erschließung einer großen Sammlung von Volksliedern aus dem deutschsprachigen Raum, mit dem Ziel diese später über ein öffentliches Informationssystem verfügbar zu machen.
Conference Paper
Full-text available
Wir präsentieren einen Beitrag zum Einsatz computergestützter Methoden für die quantitative Untersuchung einer großen Sammlung symbolisch repräsentierter Melodien deutschsprachiger Volkslieder. Im Zuge dessen wurde ein Music Information Retrieval-Tool (MIR) konzipiert, mit dem gezielt nach Liedblättern anhand bestimmter Metainformationen (z.B. Jahr...
Conference Paper
Full-text available
Patterns are a popular means to document design knowledge in different fields of application, including HCI. Accordingly, a large number of different HCI pattern formats have been suggested. However, relatively little is said about how to systematically identify such patterns. In order to make the process of identifying patterns more transparent an...
Poster
Full-text available
Mit dem Begriff des „Distant Reading“ führt Moretti (2000) einen zentralen Begriff in den Digital Humanities ein, der zu einer anhaltenden Diskussion um quantitative Methoden in der Literatur- und Kulturwissenschaft führte. Vor diesem Hintergrund sind Dramen eine besonders interessante literarische Gattung, da sie neben dem eigentlichen Text weiter...
Conference Paper
Full-text available
We present an approach that uses human computation and crowdsourcing principles for encoding large amounts of monophonic, handwritten sheet music. http://nbn-resolving.de/urn:nbn:de:bvb:12-babs2-0000007812
Article
Full-text available
The microblogging service Twitter provides vast amounts of user-generated language data. This article gives an overview of related work that has been conducted on Twitter so far. The anatomy of a Twitter message is described and typical uses of the Twitter platform discussed. The Twitter Application Programming Interface (API) will be introduced in...
Conference Paper
Full-text available
This article describes a recent digital humanities project by a group of researchers from various disciplines, including musicology, cultural studies, and information science. The project is aiming to digitize a collection of approx. 50,000 handwritten sheets of German folk music. In addition, the songs will be encoded in MusicXML format, which mak...
Conference Paper
Full-text available
In this short paper we introduce graffiti in public restrooms – also known as latrinalia – as a promising object of research. We present an application that uses crowdsourcing techniques to upload and transcribe images of latrinalia on a public web site. This article describes the basic design and functions of the application, presents the status q...
Conference Paper
Full-text available
With the increase of digital information practices (e.g. online search, desktop publishing, electronic reference management, etc.) in the academic context, printed books are sometimes cumbersome to integrate into the digital workflow. We present ResearchSherlock, an Android app that allows the user to quickly gather bibliographic information for a...
Conference Paper
Full-text available
Social media services like Twitter churn out user-generated content in vast amounts. The massive availability of this kind of data demands new forms of analysis and visualization, to make it accessible and interpretable. In this article , we introduce Twista, an application that can be used to create tailored tweet collections according to specific...
Conference Paper
Full-text available
Streetart ist ein Sammelbegriff für unterschiedliche Kunstformen im urbanen Raum, zu denen u.a. Graffitis, Poster, und Installationen zählen. Eine Vielzahl bestehender Publikationen zeigt, dass Streetart auch als wissenschaftliches Forschungsobjekt zunehmend an Relevanz gewinnt. Mit dem StreetartFinder (www.streetartfinder.de) wurde ein Web-Tool g...
Conference Paper
Full-text available
Im Rahmen einer Vortragsreihe zum 350-jährigen Reichstagsjubiläum in der Stadt Regensburg wurde in Ergänzung zum Thema „Das Jahrhundert des Dramas und der Komödien: Blüte des Regensburger Theaterlebens“ eine virtuelle 3D-Rekonstruktion des heute nicht mehr vorhandenen Regensburger Ballhauses am Ägidienplatz erstellt. Die 3D-Rekonstruktion stellt ei...
Conference Paper
Full-text available
We present WebNLP, a web-based tool that combines natural language processing (NLP) functionality from Python NLTK and text visualizations from Voyant in an integrated interface. Language data can be uploaded via the website. The results of the processed data are displayed as plain text, XML markup, or Voyant visualizations in the same website. Web...
Conference Paper
Full-text available
We present Sentilyzer, a web-based tool that can be used to analyze and visualize the sentiment of German user comments on Facebook pages. The tool collects comments via the Facebook API and uses the TreeTagger to perform basic lemmatization. The lemmatized data is then analyzed with regard to sentiment by using the Berlin Affective Word List – Rel...
Conference Paper
Full-text available
Dieser Beitrag beschreibt eine vergleichende Usability-Evaluation der drei meistgenutzten Web-Browser für mobile Endgeräte. Dabei werden die zuvor identifizierten Hauptfunktionen der Browser durch entsprechend konstruierte Aufgaben mit sechs Studienteilnehmern aus-führlich getestet. Die Ergebnisse werden in Form von think aloud-Protokollen sowie ei...
Conference Paper
Full-text available
We present a web-based tool for evaluating the information architecture of a website. The tool allows the use of crowdsourcing platforms like Amazon's MTurk as a means for recruiting test persons, and to conduct asynchronous remote navigation stress tests (cf. Instone 2000). We also report on an evaluation study which compares our tool-based crowds...
Conference Paper
Full-text available
Walk-up-and-use-systems such as vending and self-service machines request special attention concerning an easy to use and self-explanatory user interface. In this paper we present a set of design guidelines for coffee vending machines based on the results of an expert-based usability evaluation of thirteen different models.
Conference Paper
Full-text available
In this article we describe a usability evaluation of eight desktop search engines (DSEs). We used the heuristic walkthrough method to gather usability problems as well as individual strengths and weaknesses of the tested search engines. The results of the evaluation are integrated into a set of 30 design guidelines for user-friendly DSEs.