Anna Kasprzik

Anna Kasprzik
ZBW Leibniz Information Centre for Economics · Metadata and Scientific Services

Dr. rer. nat.
Automating subject indexing in libraries with methods from machine learning and semantic technologies

About

33
Publications
2,210
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
212
Citations
Introduction
Anna Kasprzik currently works at the ZBW Leibniz Information Centre for Economics in Hamburg. Anna does research and development in Automated Subject Indexing, including topics and aspects from Semantic Technologies, Vocabulary Engineering and Structural Quality Control. Homepage: https://www.zbw.eu/en/about-us/key-activities/automated-subject-indexing/anna-kasprzik/ ORCID: https://orcid.org/0000-0002-1019-3606

Publications

Publications (33)
Article
Ist der Suchschlitz ein Vintage Look&Feel des letzten Jahrhunderts? Oder - die Frage sei erlaubt, - unter welchen Bedingungen es Bibliotheken schaffen könnten, dass Nutzerinnen und Nutzer die Erstsuche nach Informationen über sie laufen lassen? Das sind beispielhafte Fragen, die man sich angesichts gewonnener Erfahrungen aus bisherigen Innovationen...
Article
Ist die Suche im Bibliothekskatalog ein Auslaufmodell? Oder die Frage sei erlaubt, unter welchen Bedingungen es Bibliotheken schaffen könnten, dass Nutzer/-innen die Erstsuche nach Informationen über sie laufen lassen? Das sind beispielhafte Fragen, die man sich angesichts gewonnener Erfahrungen aus bisherigen Innovationen stellen kann, aber ebenfa...
Article
Full-text available
This paper describes our efforts to implement the Research Core Dataset (“Kerndatensatz Forschung”; KDSF) as an ontology in VIVO. KDSF is used in VIVO to record the required metadata on incoming data and to produce reports as an output. While both processes need an elaborate adaptation of the KDSF specification, this paper focusses on the adaptatio...
Conference Paper
The development of Industrial Cyber-Physical Systems (ICPS) requires new ways of collaboration between ICPS vendors and companies which provide them with parts and components. As ICPS extend existing systems, e.g. machines or healthcare equipment, by electronic components, new supply chains will evolve between companies which were not connected bef...
Article
Zusammenfassung Große Aufmerksamkeit richtet sich im Moment auf das Potential von automatisierten Methoden in der Sacherschließung und deren Interaktionsmöglichkeiten mit intellektuellen Methoden. In diesem Kontext befasst sich der vorliegende Beitrag mit den folgenden Fragen: Was sind die Anforderungen an bibliothekarische Metadaten aus Sicht der...
Conference Paper
The document-centric workflows in science have reached (or already exceeded) the limits of adequacy. This is emphasized by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. This presents an opportunity to rethink the dominant paradigm of document-centric scholarly information communication a...
Presentation
Full-text available
Zu den Schwerpunkten der vom GND-Ausschuss veröffentlichten strategischen Neuausrichtung der GND gehört eine signifikante Steigerung der semantischen Vernetzung einerseits und, um dieses Ziel zu erreichen, eine Öffnung der Datenpflegeprozesse auf verschiedenen Ebenen für die teilnehmenden Institutionen andererseits. Insbesondere sollen unter Beteil...
Presentation
Presentation for the AK „Supply Chain Management“ (ZVEI)
Presentation
Presentation of a project proposal: A domain ontology aligment platform for the Smart Factory
Presentation
Full-text available
DMV/ÖMG-Tagung 2017 in Salzburg
Presentation
Guest lecture at the Hochschule Darmstadt, Germany (in the context of the lecture series ”Semantics II”)
Article
A MAT learning algorithm is presented that infers the universal automaton for a regular target language, using a polynomial number of queries with respect to that automaton. The universal automaton is one of several canonical char-acterizations for regular languages Our learner is based on the concept of an observation table, which seems to be part...
Article
Full-text available
Für eine korrekte und zielführende Formalerschließung lassen sich drei Ebenen von Leitlinien definieren: Zielsetzungen, Prinzipien und Regeln der Katalogisierung. Der vorliegende Artikel befasst sich hauptsächlich mit den (möglichen) Zielsetzungen und Prinzipien der Katalogisierung. In der Einleitung der im Jahr 2009 herausgegebenen "Erklärung zu d...
Article
Full-text available
Das sprunghafte Anwachsen der Menge digital verfügbarer Dokumente gepaart mit dem Zeit- und Personalmangel an wissenschaftlichen Bibliotheken legt den Einsatz von halb- oder vollautomatischen Verfahren für die verbale und klassifikatorische Inhaltserschließung nahe. Nach einer kurzen allgemeinen Einführung in die gängige Methodik beleuchtet dieser...
Article
We survey two existing algorithms for the inference of finite-state tree automata from membership queries and a finite positive sample or equivalence queries, and we suggest a reformulation of one of them which we deem necessary to ensure its termination. We present two algorithms for the same two settings when the underlying description is not a d...
Conference Paper
A MAT learning algorithm is presented that infers the universal automaton (UA) for a regular target language, using a polynomial number of queries with respect to that automaton. The UA is one of several canonical characterizations for regular languages. Our learner is based on the concept of an observation table, which seems to be particularly fit...
Article
We define a collection of language classes which are TxtEx-learnable (learnable in the limit from positive data). The learners map any data input to an element of a fixed lattice, and keep the least upper bound of all lattice elements thus obtained as the current hypothesis. Each element of the lattice is a grammar for a language, and the learner c...
Conference Paper
This paper demonstrates how existing distributional learning techniques for context-free grammars can be adapted to simple context-free tree grammars in a straightforward manner once the necessary notions and properties for string languages have been redefined for trees. Distributional learning is based on the decomposition of an object into a subs...
Conference Paper
Full-text available
In [1], Denis et al. introduce a special case of NFA, so-called residual finite-state automata (RFSA), where each state represents a residual language of the language recognized. RFSA have the advanta-geous property that there is a unique state-minimal RFSA for every regular language which makes them an attractive concept in the design of learning...
Thesis
Full-text available
This thesis centers on formal tree languages and on their learnability by algorithmic methods in abstractions of several learning settings. After a general introduction, we present a survey of relevant definitions for the formal tree concept as well as special cases (strings) and refinements (multi-dimensional trees) thereof. In Chapter 3 we discus...
Conference Paper
We recapitulate inference from membership and equivalence queries, positive and negative samples. Regular languages cannot be learned from one of those information sources only [1,2,3]. Combinations of two sources allowing regular (polynomial) inference are MQs and EQs [4], MQs and positive data [5,6], positive and negative data [7,8]. We sketch a...
Conference Paper
Full-text available
We define a two-step learner for RFSAs based on an observation table by using an algorithm for minimal DFAs to build a table for the reversal of the language in question and showing that we can derive the minimal RFSA from it after some simple modifications. We compare the algorithm to two other table-based ones of which one (by Bollig et al. 2009)...
Conference Paper
The class of regular languages is not identifiable from positive data in Gold’s language learning model. Many attempts have been made to define interesting classes that are learnable in this model, preferably with the associated learner having certain advantageous properties. Heinz ’09 presents a set of language classes called String Extension (Lea...
Article
Full-text available
We define a two-step learner for RFSAs based on an observation table by using an algorithm for minimal DFAs to build a table for the reversal of the language in question and showing that we can derive the minimal RFSA from it after some simple modifications. We compare the algorithm to two other table-based ones of which one (by Bollig et al. 2009)...
Conference Paper
Full-text available
We present and compare two methods of how to make derivation in a Tree Adjoining Grammar (TAG) a regular process without loss of expres- sive power. In a TAG, derivation is based upon the replacing of a node in a tree by another tree. One regularization method resorts to the algebraic operation of lifting, while the other exploits an additional dim...
Article
Full-text available
We present a learning algorithm for regular languages that unifies three existing ones for the settings of minimally adequate teacher learning, learning from membership queries and positive data, and learn-ing from positive and negative data, respectively. We choose these three algorithms as an example to back up the conjecture that the learning pr...
Conference Paper
Full-text available
We generalize a learning algorithm by Drewes and Hogberg (1) for regular tree languages based on a learning model proposed by An- gluin (2) to recognizable tree languages of arbitrarily many dimensions, so-called multi-dimensional trees. Trees over multi-dimensional tree do- mains have been defined by Rogers (3,4). However, since the algorithm by D...
Conference Paper
We provide a new term-like representation for multi-dimensional trees as defined by Rogers (1,2) which establishes them as a direct g eneralization of clas- sical trees. As a consequence these structures can be used as input for finite-state applications based on classical term-based tree language theory. Via the correspon- dence between string and...
Article
We recapitulate regular inference from membership and equi-valence queries, positive and negative finite samples. We present a meta-algorithm which generalizes over as many settings involving one or more of those information sources as possible and covers the whole range of combinations allowing inference with polynomial complexity. We extend the s...
Article
Full-text available
Die Fachgruppe AFS (früher Fachgruppe 0.1.5) der Gesellschaft für Informatik veranstaltet seit 1991 einmal im Jahr ein Treffen der Fachgruppe im Rahmen eines Theorietags, der traditionell eineinhalb Tage dauert. Seit dem Jahr 1996 wird dem eigentlichen Theorietag noch ein eintägiger Workshop zu speziellen Themen der theoretischen Informatik vorange...

Network

Cited By

Projects

Projects (5)
Archived project
The junior research group "Scientific Knowledge Engineering" at the department of research and development of TUB Hannover aims at the development of semantic solutions (thesauri, ontologies, corresponding platforms) in cooperation with the main target groups and stakeholders of TIB (i.e., the scientific communities associated with the six core subjects of TIB: Technology, Architecture, Chemistry, Computer Science, Mathematics, Physics).
Archived project
The junior research group "Scientific Knowledge Engineering" at the department of research and development of TIB Hannover aims to develop semantic architectures and solutions for the industrial context in cooperation with SMEs, especially domain ontologies tailored to the specific needs of an industrial partner.