
David Lindemann- PhD
- Lecturer / Researcher at UPV/EHU University of the Basque Country
David Lindemann
- PhD
- Lecturer / Researcher at UPV/EHU University of the Basque Country
About
58
Publications
6,297
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
65
Citations
Introduction
David Lindemann currently works at Dept. of Linguistics and Basque Studies,, UPV/EHU University of the Basque Country. David does research in Computational Lexicography, Linked Data, and Digital Humanities.
Current institution
UPV/EHU University of the Basque Country
Current position
- Lecturer / Researcher
Additional affiliations
January 2012 - April 2017
Publications
Publications (58)
This conference summarizes tools and workflow steps for the collaborative creation of a term-indexed collection of scientific articles. Involved software (all free and open source): Zotero, Wikibase, zotwb, spacy, GROBID. Example showcase: https://eneoli.wikibase.cloud
In this presentation, we explore ways of FAIR data publishing on web platforms based on mediaWiki, such as Wikisource, Wikimedia Commons, Wikidata, and Wikibase.
This software demonstration presents approaches to employ Wikibase in a university course on Terminology, and results of terminology projects lead by students. Wikibase, an extension of MediaWiki, is the software that underlies Wikidata, a very large crowdsourced queriable knowledge graph. We use an own Wikibase instance for cloud-based collaborati...
This software demonstration presents a data model and a first use case for the representation of text corpus data on a Wikibase instance, including morphosyntactic, semantic and philological annotations as well as links to dictionary entries. Wikibase, an extension of MediaWiki, is the software that underlies Wikidata (Vrandečić & Krötzsch, 2014),...
This poster presents a data model and two first use cases for the representation of contents of text corpus data on Wikibase instances, including morphosyntactic, semantic and philological annotations as well as links to dictionary entries. Wikibase (cf. Diefenbach et al. 2021), an extension of MediaWiki, is the software that underlies Wikidata (Vr...
Zotero is a free and open source reference and bibliography management software. It ingests structured publication metadata from files in various formats, and through a set of translators also by scraping from websites.
Wikibase is a free and open source software solution for storing, editing, and querying Linked Data. It is the software underlyin...
La financiación de proyectos de investigación recientemente se ve sujeta a la realización de los criterios FAIR para la publicación de objetos digitales. Uno de los caminos posibles para cumplir con ellos es el paradigma de los Datos Enlazados Abiertos (Linked Open Data). Partiendo de la escala de criterios de Berners-Lee para describir la calidad...
The aim of the paper is to present and analyze workflows for bibliographical data curation and research that were created during the ‘Open Bibliodata Workflows’ project realised by the Bibliographical Data Working Group from the DARIAH ERIC
consortium. These workflows are available via SSH Open Marketplace. Its role in the SSH infrastructural syste...
Euskarazko testu historikoen digitalizazioari dagokionez, ahalegin anitz ikusi ditugu azken urteotan; helburu eta metodologia ezberdinak darabiltzaten hainbat ekimen. Proiektu horien emaitza berrienen artean, EHUko Euskara Institutuak kudeatzen duen Corpus Historikoa osatzen duten testu digitalak ditugu, sareko interfaze batean esplora daitezkeenak...
Short presentation about conversion of publication metadata and other bibliodata to Linked Open Data using Wikibase and Wikidata, read at DH2023 ADHO conference in Graz (Austria); part of the panel "Fostering Collaboration to Enable Bibliodata-driven Research in the Humanities" (DARIAH Working Group Bibliodata)
Over the last decade, the Web has increasingly become a space of language and knowledge representation. However, it is only true for well-spread languages and well-established communities, while minority communities and their resources received less attention. In this paper, we propose QICHWABASE to support the harmonization process of the Quechua...
This paper presents LexMeta, a metadata model for the description of lexical resources, such as dictionaries, word lists, glossaries, etc., to be used in language data catalogues mainly targeting the lexicographic and broader humanities communities but also users exploiting such resources in their research and applications. A comparative review of...
In this session, we will briefly introduce the historical steps in the development of Wikidata from a machine-readable version of Wikipedia towards a general free Knowledge Graph, which now includes not only ontology concepts, but also lexicographical data. The main part is concerned with the representation of lexemes, senses, and forms in Wikibase...
In this paper, we present LexMeta, a metadata model for the description of humanreadable and computational lexical resources in catalogues. Our initial motivation is the extension of the LexBib knowledge graph with the addition of metadata for dictionaries, making it a catalogue of and about lexico graphical works. The scope of the proposed model,...
In this paper, we present LexMeta, a metadata model for the description of human readable
and computational lexical resources in catalogues. Our initial motivation is the extension of the LexBib knowledge graph with the addition of metadata for dictionaries, making it a catalogue of and about lexicographical works. The scope of the proposed model,...
This report has been prepared by the “Bibliographical Data” Working Group of the DARIAH-ERIC consortium, which develops public digital research infrastructure for the arts and humanities. The Group consists of more than 30 members from 15 countries, most of whom are researchers and curators in the public sector who are engaged in bibliographical da...
Artikulu honetan Larramendiren Hiztegi Hirukoitzaren digitalizazioko OCR prozesua deskribatzen da, adimen artifizialaren adarra den ikasketa automatikoa baliatuz. Horretarako, eskaneatutako irudien aurreprozesamendua deskribatzen da, eta ondoren, Kraken erreminta baliatuz, eskuz transkribatutako laginetik abiatuta hiztegiko testua ezagutuko duen er...
In this paper, we present ongoing work on Elexifinder (https://finder.elex.is), a lexicographic literature discovery portal developed in the framework of the ELEXIS (European Lexicographic Infrastructure) project. Since the first launch of the tool, the database behind Elexifinder has been enriched with publication metadata and full texts stemming...
In this paper, we present a workflow for historical dictionary digitization, with a 1745 Spanish-Basque-Latin dictionary as use case. We start with scanned facsimile images, and get to represent attestations of modern standard Basque lexemes as Linked Data, in the form they appear in the dictionary. We are also able to produce an index of the dicti...
Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover,...
Lexical resources originally meant as human-readable dictionaries, or lexical-semantic databases designed for other purposes, are most often developed isolated from each other, so that a linking of data across resources, which doubtlessly means an added value to both human readers and knowlegde-based computational applications, implies some sort of...
This short paper presents preliminary considerations regarding LexBib, a corpus, bibliography, and domain ontology of Lexicography and Dictionary Research, which is currently being developed at University of Hildesheim. The LexBib project is intended to provide a bibliographic metadata collection made available through an online reference platform....
12 This short paper presents preliminary considerations regarding LexBib, a corpus, bibliography, and 13 domain ontology of Lexicography and Dictionary Research, which is currently being developed 14 at University of Hildesheim. The LexBib project is intended to provide a bibliographic metadata 15 collection made available through an online referen...
This poster presents preliminary considerations for a new project: A merged set of Basque (legacy) lexical resources or
unified lexical database. At this preliminary stage, our main attention lies on the catalogue of data sources, on
philological problems (e.g. regarding lemmatization), and on the design of the database. We propose a data model and...
In this paper, we present a generic workflow for retro-digitizing and structuring large entry-based documents, using the 33.000 entries of Internationale Bibliographie der Lexikographie, by Herbert Ernst Wiegand, as an example (published in four volumes (Wiegand 2006-2014)). The goal is to convert the large bibliography, at present available as col...
The aim of the project is to develop a prototype for a generator of argument structure or valency realisations in terms of syntagmatic and paradigmatic combinations of Spanish, German and French nouns. The two main applications of the tool prototype we are aiming to develop, are (1) the generation of noun phrases as argument structure realizations...
This paper presents preliminary considerations regarding objectives and workflow of LexBib, a project which is currently being developed at the University of Hildesheim. We briefly describe the state of the art in electronic bibliographies in general, and bibliographies of lexicography and dictionary research in particular. The LexBib project is in...
See presentation on videolectures.net
http://videolectures.net/WNLEXworkshop2018_lindemann_wordnets/
'Purism' can characterise attitudes about a wide range of linguistic phenomena, but the most common forms of linguistic purism are those concerned with the lexicon. When standardisation of language is at issue, questions of purism are unavoidable. Are processes of standardisation necessarily motivated by puristic attitudes? Or is purism a consequen...
Ziel der hier vorgestellten Studie ist eine Beschreibung der Schnittmenge von Diskursräumen in der Lexikographie bzw. Metalexikographie und den Digital Humanities (DH). Dabei geht es um die Bestimmung von explizit bzw. implizit als Teil der DH aufzufassenden Beiträgen zu lexikographischen Themen und, andersherum, von lexikographierelevanten Themen,...
In this paper, we present a simple method for drafting sense-disambiguated bilingual dictionary content using lexical data extracted from merged wordnets, on the one hand, and from BabelNet, a very large resource built automatically from wordnets and other sources, on the other. Our motivation for using English-Basque as a showcase is the fact that...
In this article, we present a set of computational methods based on corpora or on the extraction of data from existing lexical resources for drafting bilingual dictionary content. These methods operate on three structural levels: (1) lemma lists, (2) syntactic entities, and (3) translation equivalents. The described methods are applied to the langu...
This paper presents a simple method for drafting bilingual dictionary content using existing lexical and NLP resources for Basque. The method consists of five steps, three belonging to a semi-automatic drafting, and another two to semi-automatic and manual post-editing: (1), the building of a corpus-based frequency lemma list; (2) the drafting of s...
In this PhD thesis, we present research carried out during the last five years. Bilingual Lexicography with German and Basque is the main issue shared by the whole range of research lines we have been following. To create a new German and Basque bilingual dictionary has been our goal, a German-Basque electronic dictionary, which is directed first a...
This paper presents a simple methodology to create corpus-based frequency lemma lists, applied to the case of the Basque language. Since the first work on the matter in 1982, the amount of text written in Basque and language resources related to this language has grown exponentially. Based on state-of-the-art Basque corpora and current NLP technolo...
Ibon Sarasolak 1982. urtean euskarazko maiztasun-hiztegia argitaratu zuen, 1977ko corpus batean oinarriturik. Ondorengo hamarkadetan, euskaraz idatzitako testuen zein baliabide elektronikoen kopurua handitu egin da esponentzialki. Gaur eskuragarri ditugun datuetan oinarrituta, euskara batuaren maiztasun-lemategi bat garatzea dugu helburu ikerketa h...
In this paper, we introduce the new electronic dictionary project EuDeLex, which is currently being worked on at UPV-EHU University of the Basque Country.1 The introduction addresses the need for and functions of a new electronic dictionary for that language pair, as well as general considerations about bilingual lexicography and German as foreign...
This paper presents a set of Bilingual Dictionary Drafting (BDD) methods including manual extraction from existing lexical databases and corpus based NLP tools, as well as their evaluation on the example of German-Basque as language pair. Our aim is twofold: to give support to a German-Basque bilingual dictionary project by providing draft Bilingua...
Lexicography over the last decades has incorporated Corpus Linguistics methods. Lexicographers who start to work on an electronic dictionary, starting from scratch as Computational Linguists, and with little or no previous work done on their language pair, have to evaluate the contributions Corpus Linguistics methods may provide to their project, n...
In this paper, we introduce the new electronic dictionary project EuDeLex, which is currently being worked on at UPV-EHU University of the Basque Country. The introduction addresses the need for and functions of a new electronic dictionary for that language pair, as well as general considerations about bilingual lexicography and German as foreign l...
En el siglo XIX, en el campo de la lexicografía bilingüe euskera-alemán tres autores nos dejaron obras publicadas en la época, manuscritos posteriormente editados y manuscritos sin editar, entre los cuales destaca el manuscrito de CAF Mahn, fechado en 1840, que reúne amplio material lexicográfico de diversas fuentes, entre ellas posiblemente los ma...