
Manuel Burghardt- Professor
- Professor at Leipzig University
Manuel Burghardt
- Professor
- Professor at Leipzig University
Computational Humanities (https://ch.uni-leipzig.de/)
About
106
Publications
53,417
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
456
Citations
Introduction
I am head of the Computational Humanities Group at the Leipzig University (https://ch.uni-leipzig.de/).
I am interested in modeling and analyzing all kinds of humanities data, including digitized texts (electronic corpora) as well as born digital texts (social media), images, music, film and digital culture (e.g. video games).
Current institution
Additional affiliations
November 2014 - present
April 2010 - October 2014
April 2009 - March 2010
Publications
Publications (106)
Data acquisition in dialectology is typically a tedious task, as dialect samples of spoken language have to be collected via questionnaires or interviews. In this article, we suggest to use the “web as a corpus” approach for dialectology. We present a case study that demonstrates how authentic language data for the Bavarian dialect (ISO 639-3:bar)...
Understanding the nature–culture entanglement by combining the methods of natural sciences and humanities is little approached in neither of the fields. With a specific combination of methods from both digital humanities and ecology, we aimed at identifying several of people's life circumstances that relate to their individual sensitivity towards b...
The Covid-19 pandemic demonstrates the relevance of virtual museums, which provide access to art and cultural heritage even in times when museums are closed. Besides their undisputable relevance, another important factor is the acceptance of virtual museums, which primarily depends on usability and user experience. So far, there are only generic gu...
We present the interactive Leipzig Corpus Miner (iLCM), which is the result of the development of an integrated research environment for the analysis of text data. The key features of iLCM compared to existing software tools for computer-assisted text analysis are its flexibility and scalability. The tool includes functions to offer commonly needed...
The status of theory in the Digital Humanities (DH) has been the subject of much debate. As a result, we find different theory narratives competing and entangled with each other. If at all, these narratives can only be grasped and examined from a somewhat detached perspective. Here, we attempt to investigate these elusive narratives by means of a c...
This special issue deals with existing theory narratives and conceptions in DH scholarship. Introducing the neologism “theorytellings”, this special issue invites DH scholars to narrate and discuss their own theoretical contributions to the field.
https://doi.org/10.5281/zenodo.6304590
Nature's non‐material contributions to people are difficult to quantify and one aspect in particular, nature's contributions to communication (NCC), has so far been neglected. Recent advances in automated language processing tools enable us to quantify diversity patterns underlying the distribution of plant and animal taxon labels in creative liter...
We present the Game Walkthrough Corpus (GWTC), which contains 12,295 unique walkthrough documents covering 6,117 games. For each game walkthrough, we provide frequencies of unigrams and bigrams, treating the walkthrough document as a Bag of Words. In addition, we provide word frequencies at the sentence level. Furthermore, the GWTC contains a numbe...
The MuSe (Music Sentiment) dataset contains sentiment information for 90,001 songs. We computed scores for the affective dimensions of valence, dominance, and arousal, based on the user-generated tags that are available for each song via Last.fm. In addition, we provide artist, title and genre metadata, and a MusicBrainz ID and a Spotify ID, which...
Although digital humanities (DH) has received a lot of attention in recent years, its status as “a discipline in its own right” (Schreibman et al., A companion to digital humanities (pp. xxiii–xxvii). Blackwell; 2004) and its position in the overall academic landscape are still being negotiated. While there are countless essays and opinion pieces t...
Books printed before 1800 present major problems for OCR. One of the main obstacles is the lack of diversity of historical fonts in training data. The OCR-D project, consisting of book historians and computer scientists, aims to address this deficiency by focussing on three major issues. Our first target was to create a tool that identifies font gr...
In this paper, we discuss the computer-aided processing of handwritten tabular records of historical weather data. The observationes meteorologicae, which are housed by the Regensburg University Library, are one of the oldest collections of weather data in Europe. Starting in 1771, meteorological data was consistently documented in a standardized f...
We investigate how to train a high quality optical character recognition (OCR) model for difficult historical typefaces on degraded paper. Through extensive grid searches, we obtain a neural network architecture and a set of optimal data augmentation settings. We discuss the influence of factors such as binarization, input line height, network widt...
Bereits seit einigen Jahren werden große Anstrengungen unternommen, um die im deutschen Sprachraum erschienenen Drucke des 16.-18. Jahrhunderts zu erfassen und zu digitalisieren. Deren Volltexttransformation konzeptionell und technisch vorzubereiten, ist das übergeordnete Ziel des DFG-Projekts OCR-D, das sich mit der Weiterentwicklung von Verfahren...
One important and particularly challenging step in the optical character recognition (OCR) of historical documents with complex layouts, such as newspapers, is the separation of text from non-text content (e.g. page borders or illustrations). This step is commonly referred to as page segmentation. While various rule-based algorithms have been propo...
We present Katharsis, a tool for "computational drametrics" that implements Solomon Marcus' (1973) theory of mathematical drama analysis. The tool computes and visualizes character configurations and speech statistics for different levels of analysis and allows users to compare different collections of plays. We illustrate the usefulness of the too...
We present a case study as part of a work-in-progress project about multimodal sentiment analysis on historic German plays, taking Emilia Galotti by G. E. Lessing as our initial use case. We analyze the textual version and an audio version (audio book). We focus on ready-to-use sentiment analysis methods: For the textual component, we implement a n...
We present first results of an ongoing research project on sentiment annotation of historical plays by German playwright G. E. Lessing (1729-1781). For a subset of speeches from six of his most famous plays, we gathered sentiment annotations by two independent annotators for each play. The annotators were nine students from a Master's program of Ge...
We present a case study as part of a work-in-progress project about multimodal sentiment analysis on historic German plays, taking Emilia Galotti by G. E. Lessing as our initial use case. We analyze the textual version and an audio version (audiobook). We focus on ready-to-use sentiment analysis methods: For the textual component, we implement a na...
We present results from a project on sentiment analysis of drama texts, more concretely the plays of Gotthold Ephraim Lessing. We conducted an annotation study to create a gold standard for a systematic evaluation. The gold standard consists of 200 speeches of Lessing's plays and was manually annotated with sentiment information by five annotators....
We present results of a sentiment annotation study in the context of historical German plays. Our annotation corpus consists of 200 representative speeches from the German playwright Gotthold Ephraim Lessing. Six annotators, five non-experts and one expert in the domain, annotated the speeches according to different sentiment annotation schemes. Th...
Zusammenfassung
Der Beitrag beschreibt ein laufendes Projekt zur computergestützten Erschließung und Analyse einer großen Sammlung handschriftlicher Liedblätter mit Volksliedern aus dem deutschsprachigen Raum. Am Beispiel dieses praktischen Projekts werden Chancen und Herausforderungen diskutiert, die der Einsatz von Digital Humanities-Methoden für...
n diesem Beitrag wird über die Ergebnisse eines laufenden Digital Humanities-Projekt zur Sentiment Analysis in literarischen Texten berichtet und die Implikation von diesem diskutiert. In dem Projekt wer-den verschiedene Methoden der Sentiment Analysis auf Texte historischer Dramen des 18. Jahrhunderts von G. E. Lessing implementiert und gegeneinan...
In this article we present an exploratory interface for the analysis of movies. Movies are segmented into shots, which are in turn displayed as scalable MovieBarcodes, i.e. film scholars can zoom into the MovieBarcode representation to explore single chapters or scenes. The tool also provides a search function that can be used to filter shots accor...
In this paper, we describe the challenge of transcribing a large corpus of handwritten music scores. We conducted an evaluation study of three existing optical music recognition (OMR) tools. The evaluation results indicate that OMR approaches do not work well for our corpus of highly heterogeneous, handwritten music scores. For this reason, we desi...
Dieser Beitrag beschreibt ein laufendes Projekt zur digitalen Erschließung einer großen Sammlung von Volksliedern aus dem deutschsprachigen Raum, mit dem Ziel diese später über ein öffentliches Informationssystem verfügbar zu machen.
Wir präsentieren einen Beitrag zum Einsatz computergestützter Methoden für die quantitative Untersuchung einer großen Sammlung symbolisch repräsentierter Melodien deutschsprachiger Volkslieder. Im Zuge dessen wurde ein Music Information Retrieval-Tool (MIR) konzipiert, mit dem gezielt nach Liedblättern anhand bestimmter Metainformationen (z.B. Jahr...
Patterns are a popular means to document design knowledge in different fields of application, including HCI. Accordingly, a large number of different HCI pattern formats have been suggested. However, relatively little is said about how to systematically identify such patterns. In order to make the process of identifying patterns more transparent an...
Mit dem Begriff des „Distant Reading“ führt Moretti (2000) einen zentralen Begriff in den Digital Humanities ein, der zu einer anhaltenden Diskussion um quantitative Methoden in der Literatur- und Kulturwissenschaft führte. Vor diesem Hintergrund sind Dramen eine besonders interessante literarische Gattung, da sie neben dem eigentlichen Text weiter...
We present an approach that uses human computation and crowdsourcing principles for encoding large amounts of monophonic, handwritten sheet music.
http://nbn-resolving.de/urn:nbn:de:bvb:12-babs2-0000007812
The microblogging service Twitter provides vast amounts of user-generated language data. This article gives an overview of related work that has been conducted on Twitter so far. The anatomy of a Twitter message is described and typical uses of the Twitter platform discussed. The Twitter Application Programming Interface (API) will be introduced in...
This article describes a recent digital humanities project by a group of researchers from various disciplines, including musicology, cultural studies, and information science. The project is aiming to digitize a collection of approx. 50,000 handwritten sheets of German folk music. In addition, the songs will be encoded in MusicXML format, which mak...
In this short paper we introduce graffiti in public restrooms – also known as latrinalia – as a promising object of research. We present an application that uses crowdsourcing techniques to upload and transcribe images of latrinalia on a public web site. This article describes the basic design and functions of the application, presents the status q...
With the increase of digital information practices (e.g. online search, desktop publishing, electronic reference management, etc.) in the academic context, printed books are sometimes cumbersome to integrate into the digital workflow. We present ResearchSherlock, an Android app that allows the user to quickly gather bibliographic information for a...
Social media services like Twitter churn out user-generated content in vast amounts. The massive availability of this kind of data demands new forms of analysis and visualization, to make it accessible and interpretable. In this article , we introduce Twista, an application that can be used to create tailored tweet collections according to specific...
Streetart ist ein Sammelbegriff für unterschiedliche Kunstformen im urbanen Raum, zu denen u.a. Graffitis, Poster, und Installationen zählen. Eine Vielzahl bestehender Publikationen zeigt, dass Streetart auch als wissenschaftliches Forschungsobjekt zunehmend an Relevanz gewinnt.
Mit dem StreetartFinder (www.streetartfinder.de) wurde ein Web-Tool g...
Im Rahmen einer Vortragsreihe zum 350-jährigen Reichstagsjubiläum in der Stadt Regensburg wurde in Ergänzung zum Thema „Das Jahrhundert des Dramas und der Komödien: Blüte des Regensburger Theaterlebens“ eine virtuelle 3D-Rekonstruktion des heute nicht mehr vorhandenen Regensburger Ballhauses am Ägidienplatz erstellt. Die 3D-Rekonstruktion stellt ei...
We present WebNLP, a web-based tool that combines natural language processing (NLP) functionality from Python NLTK and text visualizations from Voyant in an integrated interface. Language data can be uploaded via the website. The results of the processed data are displayed as plain text, XML markup, or Voyant visualizations in the same website. Web...
We present Sentilyzer, a web-based tool that can be used to analyze and visualize the sentiment of German user comments on Facebook pages. The tool collects comments via the Facebook API and uses the TreeTagger to perform basic lemmatization. The lemmatized data is then analyzed with regard to sentiment by using the Berlin Affective Word List – Rel...
Dieser Beitrag beschreibt eine vergleichende Usability-Evaluation der drei meistgenutzten Web-Browser für mobile Endgeräte. Dabei werden die zuvor identifizierten Hauptfunktionen der Browser durch entsprechend konstruierte Aufgaben mit sechs Studienteilnehmern aus-führlich getestet. Die Ergebnisse werden in Form von think aloud-Protokollen sowie ei...
We present a web-based tool for evaluating the information architecture of a website. The tool allows the use of crowdsourcing platforms like Amazon's MTurk as a means for recruiting test persons, and to conduct asynchronous remote navigation stress tests (cf. Instone 2000). We also report on an evaluation study which compares our tool-based crowds...
Walk-up-and-use-systems such as vending and self-service machines request special attention concerning an easy to use and self-explanatory user interface. In this paper we present a set of design guidelines for coffee vending machines based on the results of an expert-based usability evaluation of thirteen different models.
In this article we describe a usability evaluation of eight desktop search engines (DSEs). We used the heuristic walkthrough method to gather usability problems as well as individual strengths and weaknesses of the tested search engines. The results of the evaluation are integrated into a set of 30 design guidelines for user-friendly DSEs.
Wir beschreiben eine Designstudie für ein virtuelles Bücherregal, das wesentliche Merkmale der Interaktion mit Büchern als physischen Artefakten ins digitale Medium transportieren soll. Dabei steht das Aufgreifen der konkreten visuellen Anordnung von Büchern im Mittelpunkt. Das virtual bookshelf soll das Personal Information Management (PIM) verbes...
Desktop-Suchmaschinen (DSM) unterscheiden sich in wesentlichen Punkten von Web-Suchmaschinen (WSM), weshalb ihr Interaktionsdesign anderen Anforderungen genüge leisten muss. Der Artikel stellt eine expertenbasierte Usability-Studie vor, bei der insgesamt acht DSM mithilfe eines ‚Heuristic Walkthrough’ untersucht wurden. Die dabei identifizierten Us...
Dieser Beitrag beschreibt, wie aktuelle Annotationstools für Tablet-Computer Annotationsformen umsetzen, die aus dem Schreib- und Printbereich bekannt sind. Für ein exemplarisches Tool untersuchen wir, welche Annotationsformen wie oft genutzt werden, wenn es darum geht, einen wissenschaftlichen Text mit Annotationen aufzubereiten und zu erschließen...
In this paper we present Tworpus, an easy-to-use tool for the creation of tailored Twitter corpora. Tworpus allows scholars to create corpora without having to know about the Twitter Application Programming Interface (API) and related technical aspects. At the same time our tool complies with Twitter’s ”rules of the road” on how to use tweet data....
Dieser Beitrag beschreibt eine Studie zur Nutzung von Twitter als in-teraktive Erweiterung des statischen Mediums Fernsehen. Dabei wurden ca. 3.700 Live-Tweets zu einer Folge der deutschen Krimireihe Tatort nach Inhalt und Funktion kategorisiert und anschließend in Hinblick auf vornehmlich medi-enwissenschaftliche Erkenntnisinteressen untersucht. D...
In this article we present a web-based tool for the visualization and analysis of quantitative characteristics of Shakespeare plays. We use resources from the Folger Digital Texts Library 1 as input data for our tool. The Folger Shakespeare texts are annotated with structural markup from the Text Encoding Initiative (TEI) 2 . Our tool interactively...
Dieser Beitrag skizziert die Kluft, die zwischen linguistischen Annotationswerkzeugen einerseits, und etablierten Methoden des Usability Engineering anderseits, besteht. Eine Evaluationsstudie dreier weit verbreiteter Annotationstools offenbart unterschiedliche Kategorien von Usability-Problemen, auf deren Basis eine Sammlung von 28 Design-Empfehlu...
In this paper we present the results of a heuristic usability evaluation of three annotation tools (GATE, MMAX2 and UAM Corpus-Tool). We describe typical usability problems from two categories: (1) general problems, which arise from a disregard of established best practices and guidelines for user interface (UI) design, and (2) more specific proble...
Purpose — This chapter illustrates and explains the ambiguity and vagueness of the term social search and aims at describing and classifying the heterogeneous landscape of social search implementations on the WWW.
Methodology/approach — We have looked at different definitions as well as the context of social search by carrying out an extensive lite...
Zu den besonderen Kennzeichen von Design Thinking-Prozessen gehören einerseits frühzeitiges Umsetzen und Evaluieren von Design-Ideen in Form von Prototypen, und andererseits die Iteration von Design-Schritten im Falle eines negativen Testergebnisses. Das Produkt entsteht dabei letztendlich auf Basis eines evolutionären Designprozesses. In diesem Be...
Wir stellen in diesem Artikel Visionen für die Vorlesung der Zukunft im Jahre 2020 bzw. 2050 vor, die systematisch mit dem Design Thinking-Ansatz erarbeitet wurden. Neben den so gewonnenen Erkenntnissen und Ideen für die „Future Lecture“ soll auch auf die Methode selbst und deren prinzipielle Eignung für das Erarbeiten von „Zukunftsvisionen“ eingeg...
This paper presents social media marketing strategies and methods for the academic area regarding specific target groups and marketing goals. Current social media marketing activities for promoting the newly established chair of media informatics at the University of Regensburg are discussed by analyzing a corresponding field study.
In diesem Beitrag stellt die Regensburger Medieninformatik unterschiedliche Ansatzpunkte für begreifbare Lehre vor, welche sich im Rahmen des Medieninformatik-Curriculums ergeben.
In diesem Beitrag wird die Methode des Design Thinking mit dem Gestaltungsprozess benutzerorientierter Systeme, wie er in der ISO-Norm 13407 („human-centered design processes for interactive systems“) vorgeschlagen wird, verglichen. Dabei werden Unterschiede und Gemeinsamkeiten der beiden Vorgehensweisen aufgezeigt und zur Diskussion gestellt.
Questions
Questions (2)
Dear all,
does anybody know about Facebook's policy with regard to sharing content (gathered via the Facebook Graph API) publicly as a corpus?
We created a corpus that contains the raw message text of the posts of an open Facebook group, i.e. the messages are visible to anybody who visits the group. We also do not share any other metainformation such as the author's name or the date of the post.
I am, however, a little reluctant to publish the corpus on the web, as I have not come accross many publicly available Facebook corpora so far. In addition, I know that other social media services, Twitter in particular, have rather strict rules of the road that do not permit to share Tweets outside the Twitter platform (the only workaround is to share a list of Tweet-IDs, as e.g. at http://trec.nist.gov/data/tweets/)
Kind regards
Manuel
Note: So far I have experimented with an untrained TreeTagger, but (unsurprisingly) only with mediocre results :-/ Any hints on existing training data are also appreciated
The results so far can be viewed here: http://dh.wappdesign.net/post/583 (lemmatized version is displayed in the second text column)