PresentationPDF Available

Metalexicography as Knowledge Graph

Authors:
Metalexicography as
Knowledge Graph
david.lindemann@uni-hildesheim.de
klaesc@uni-hildesheim.de
philipp.zumstein@bib.uni-mannheim.de
David Lindemann
Christiane Klaes
Philipp Zumstein
2nd LDK Conference, Leipzig 2019
Overview
Introduction: The LexBib project
Workflow
Compilation of corpus and publication metadata collection
see Lindemann, Kliche & Heid 2018 (Euralex)
Exrraction of citation relations
RDF representation and entity linking
Bibliographic metadata
Provenance metadata for text mining acitivities and results
Conclusion and outlook
2nd LDK Conference, Leipzig 2019
Intro: Workflow in the LexBib project
RDF/XML
OWL-RDF
Domain
Ontology Topic
Models
Term-
candidates
Citation
network
RDF-Translator
LDA
Term-Extr.
GROBID
Full text corpus
Metadata
Provenance
Provenance
Metadata
Metadata
2nd LDK Conference, Leipzig 2019
Publication metadata: Zotero RDF/XML translator
https://github.com/zotero/translators
BIBO
dcterms
FOAF
[...]
DNB
guidelines
for RDF
representation
of bibliographical
metadata
external
URI
SD-LLOD2019
Miniproject-8
2nd LDK Conference, Leipzig 2019
Metalexicography as Knowledge Graph: Scenario
Text Mining Results
Publication Metadata
Domain Ontology
Creator (Role-Person)
Publication Date
Metadata for citation (title,
pages, etc.)
Container-Publication
(Journal, Collective Volume)
Publication Place
Publisher
Language
Publication Permanent
Identifier
Abstract
Extracted Terms (Term Extraction
– Term Weirdness)
Topic Weights (Topic Modeling)
Citations (References)
Subject Headings
(Object Languages)
Subject Headings
(General Domain Ontology)
Person
ISBN
DOI
Place
xsd:gYear
Language
(ISO-639-3)
is-a
link to external resource
(manually validated)
link to external resource
(automatically set)
2nd LDK Conference, Leipzig 2019
Zotero RDF export: fixes / entity linking (current state)
Publication Metadata
dcterms:creator
dcterms:date
dcterms:title
bibo:isbn10 / bibo:isbn13
bibo:issn
bibo:doi
Publication Place
dcterms:publisher
dcterms:language
FOAF
Place
Language
(ISO-639-1)
Publisher
internal URI
(one FOAF name,
one entity)
2nd LDK Conference, Leipzig 2019
Extraction of
Citation Relations
https://github.com/locdb toolchain
GROBID (CRF) for extraction of bibliographic
references (TEI <listBibl>),
https://github.com/kermitt2/grobid
Queries to CrossRef, OpenCitations,
local datasets for refinement
https://locdb.bib.uni-mannheim.de/blog
2nd LDK Conference, Leipzig 2019
Term extraction
Example: Top 20 term candidates,
extracted from a single full text
(here, a 2014 Euralex keynote
speech)
Term candidates ranked according
to weirdness ratio (tf-idf)
Reference corpus BNC
weirdness compared to general
language
Reference corpus LexBib
(whole collection)
weirdness compared to general
discourse in domain
Top 20 Terms (Ref. BNC) Top 20 Terms (Ref. LexBib)
text reception information-on-demand
text production on-demand
dictionary function data repository
multiword user friendliness
word formation user orientation
production dictionary text reception
user orientation production dictionary
information-on-demand text production
data repository dictionary function
dictionary entry repository
user friendliness valency
internet guidance
markup orientation
on-demand concord
valency word formation
concord scenario
dictionary markup
corpus production
collocation classification
language processing advance
2nd LDK Conference, Leipzig 2019
Representation of statistical term extraction
results and provenance
Weirdness ratio (numeric value)
Term status (manual annotations)
Mapping to domain ontology concept
Term extraction: Relevant metadata
Tool and version
Source corpus description
Reference corpus description
Retrieved POS patterns
Weirdness rank thresholds
Timestamp
Provenance representation
based on DNB model
SD-LLOD2019
Miniproject-8
2nd LDK Conference, Leipzig 2019
lexbibactivity:
TermexActivity#0001
a prov:Activity
""^^xsd:dateTime
(bNode)
a prov:Association
prov:wasGeneratedby
lexbibtermcand:12345#text_reception
a doco:TextChunk ; a lexbibvoc:TermCand
dc:source lexbibvoc:hasMapping
lexdo:TextReception
a lexbibvoc:DomainOntologyTerm
lexbibactivity:
MappingActivity#0001>
a prov:Activity
prov:wasGeneratedby
lexbibvoc:mappedConcept
prov:agent
<http://lexbib.org/team/member0001>
a foaf:Person
<http://lexbib.org/bib/items/12345/fulltextbody>
a doco:BodyMatter
prov:agent
<https://www.ims.uni-stuttgart.de/forschung/ressourcen/werkzeuge/trex>
a prov:SoftwareAgent
<http://lexbib.org/data/id/
TermexPlan/0001>
a prov:Plan
prov:plan
wikidata:Q236935
a dc:collection
; rdfs:label "BNC"
<http://lexbib.org/items/12345>
a bibo:Document
dc:source
prov:hasQualifiedAssociation
lexbibvoc:usedTextSource
lexbibvoc:usedReferenceCorpus
"NN - NN NN - NN NN NN"
"0.1"^^xsd:decimal
lexbibvoc:usedPosPattern
lexbibvoc:usedWeirdnessThreshold
prov:endedAtTime
<http://lexbib.org/team/member0001>
a foaf:Person
""^^xsd:dateTime
prov:endedAtTime
prov:agent
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix doco: <http://purl.org/spar/doco/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix wikidata: <http://www.wikidata.org/entity/> .
@prefix lexdo: <http://lexbib.org/ontology/> .
@prefix lexbibterms: <http://lexbib.org/data/TermCand/ .
@prefix lexbibitem: <http://lexbib.org/bib/items/> .
@prefix lexbibvoc: <http://lexbib.org/rdf/voc/> .
@prefix lexbibactivity: <http://lexbib.org/activities/> .
@prefix lexbibplan: <http://lexbib.org/plans/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix doco: <http://purl.org/spar/doco/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix wikidata: <http://www.wikidata.org/entity/> .
@prefix lexdo: <http://lexbib.org/ontology/> .
@prefix lexbibterms: <http://lexbib.org/data/TermCand/ .
@prefix lexbibitem: <http://lexbib.org/bib/items/> .
@prefix lexbibvoc: <http://lexbib.org/rdf/voc/> .
@prefix lexbibactivity: <http://lexbib.org/activities/> .
@prefix lexbibplan: <http://lexbib.org/plans/> .
lexbibactivity:
Pdf2TeiActivity#0001
a prov:Activity
prov:wasGeneratedby
""^^xsd:dateTime
prov:endedAtTime
<http://lexbib.org/team/member0001>
a foaf:Person
<http://cloud.science-miner.com/grobid/>
a prov:SoftwareAgent
prov:agent prov:agent
(bNode)
RDF Representation example:
Term Candidate "text reception"
extracted from item #12345 fulltext body,
with the BNC as reference corpus
RDF Representation example:
Term Candidate "text reception"
extracted from item #12345 fulltext body,
with the BNC as reference corpus
(bNode)
a prov:Association
prov:hasQualifiedAssociation
(bNode)
a prov:Association
prov:hasQualifiedAssociation
"3.68"^^xsd:decimal
lexbibvoc:hasWeirdness
2nd LDK Conference, Leipzig 2019
https://www.researchgate.net/project/LexBib-Corpus-and-Bibliography
2nd LDK Conference, Leipzig 2019
Conclusion
Our Data
Publication metadata
"formal metadata"
Additional bibliographic metadata
"content desciption metadata"
Topics, Term Candidates incl. mapping,
citation relations
Our Metadata
Provenance of our datasets
Workflow
ML-based recommenders for manual
validation
Process metadata:
Manual workload predictions for
application to broader domains
Participation in ongoing discussions
DNB, UB Mannheim, VZG, etc.
DARIAH-WG "bibliographical data"
Main project
5000-6000 LexBib items in 2 years
EN, DE, 2000-2019. then also ES
Later: More languages, older items
Proposal submitted
Preliminary steps in coop. with Elexis
Domain Ontology
Dictionaries
Metalexicography
(Lindemann, Giacomini & Klaes in prep.)
Outlook
2nd LDK Conference, Leipzig 2019
Thank you for your attention
david.lindemann@uni-hildesheim.de
klaesc@uni-hildesheim.de
philipp.zumstein@bib.uni-mannheim.de
http://www.zotero.org/groups/lexbib/items
See detailed bibliography in the proceedings
https://www.researchgate.net/project/LexBib-Corpus-and-Bibliography
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.