
Andreas ThorLeipzig University of Applied Sciences | HTWK
Andreas Thor
Professor
About
82
Publications
27,923
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,288
Citations
Introduction
Additional affiliations
April 2013 - present
April 2012 - March 2013
January 2010 - April 2011
Publications
Publications (82)
In this article, we explore the transformative impact of advanced, parameter-rich Large Language Models (LLMs) on the production of instructional materials in higher education, with a focus on the automated generation of both formative and summative assessments for learners in the field of mathematics. We introduce a novel LLM-driven process and ap...
NoSQL document stores are becoming increasingly popular as backends in web development. Not only do they scale out to large volumes of data, many systems are even custom-tailored for this domain: NoSQL document stores like Google Cloud Datastore have been designed to support massively parallel reads, and even guarantee strong consistency in updatin...
EAs.LiT is an e-assessment management and analysis software for which contextual requirements and usage scenarios changed over time. Based on these factors and further development activities, the decision was made to adopt a microservice architecture for EAs.LiT version 2 in order to increase its flexibility to adapt to new and changed circumstance...
Zusammenfassung
Die Bearbeitung von Übungsaufgaben ist ein wichtiges Element in der Datenbank-Lehre. Lösungen der Studierenden lassen sich dabei häufig in strukturierten Ergebnisformaten festhalten, wie z. B. SQL-Anfragen oder die Spezifikation von Schemata und Relationen. Dieser Beitrag stellt das E‑Assessment-Tool DMT (Data Management Tester) vor...
The classification of e-assessment items with levels of Bloom’s taxonomy is an important aspect of effective e-assessment. Such annotations enable the automatic generation of parallel tests with the same competence profile as well as a competence-oriented analysis of the students’ exam results. Unfortunately, manual annotation by item creators is r...
Elektronische Prüfungen (E-Assessments) mit geschlossenen Aufgabenformaten sind in teilnehmerstarken Prüfungssituationen ein besonders effizientes Testverfahren (Michel et al., 2015; Pengel et al., 2019). Gleichzeitig entstehen mit ihrem Einsatz hohe Aufwände bei der Verwaltung der notwendigen Aufgabensammlungen (Itempools). Die zumeist dafür einge...
Digitisation has established itself as the change maker par excellence in business, science and society. Infrastructures, working methods and skills are at the forefront of many debates and increasingly determine the future viability of entire industries.
We have obviously embraced the permanent change with increasing acceleration. But: Where is th...
What are the landmark papers in scientific disciplines? Which papers are indispensable for scientific progress? These are typical questions which are of interest not only for researchers (who frequently know the answers – or guess to know them) but also for the interested general public. Citation counts can be used to identify very useful papers si...
The temporal analysis of evolving graphs is an important requirement in many domains. We are therefore extending the distributed graph analysis framework Gradoop and its graph data model to support temporal graph analysis. This paper contains an overview of our work in progress and an example use case from the financial domain demonstrating the fle...
In der Diskussion über die Digitalisierung der Forschung spielt die Frage nach der optimalen IT-Unterstützung für Forschende eine wichtige Rolle. Forschende können heute an ihren Hochschulen bzw. Wissenschaftseinrichtungen auf ein breites Angebot interner IT-Dienstleistungen zurückgreifen, das auch kooperative IT-Dienste umfasst, die von mehreren I...
The temporal analysis of evolving graphs is an important requirement in many domains but hardly supported in current graph database and graph processing systems. We therefore have started with extending the distributed graph analysis framework Gradoop for temporal graph analysis by adding time properties to vertices, edges and graphs and using them...
Since the introduction of the reference publication year spectroscopy (RPYS) method and the corresponding programme CRExplorer, many studies have been published revealing the historical roots of topics, fields and researchers. The application of the method was restricted up to now by the available memory of the computer used for running the CRExplo...
What are the landmark papers in scientific disciplines? On whose shoulders does research in these fields stand? Which papers are indispensable for scientific progress? These are typical questions which are not only of interest for researchers (who frequently know the answers - or guess to know them), but also for the interested general public. Cita...
Since the introduction of the reference publication year spectroscopy (RPYS) method and the corresponding program CRExplorer, many studies have been published revealing the historical roots of topics, fields, and researchers. The application of the method was restricted up to now by the available memory of the computer used for running the CRExplor...
Reference Publication Year Spectroscopy (RPYS) has been developed for identifying the cited references (CRs) with the greatest influence in a given paper set (mostly sets of papers on certain topics or fields). The program CRExplorer (see www.crexplorer.net) was specifically developed by Thor, Marx, Leydesdorff, and Bornmann (2016a, 2016b) for appl...
Reference Publication Year Spectroscopy (RPYS) has been developed for identifying the cited references (CRs) with the greatest influence in a given paper set (mostly sets of papers on certain topics or fields). The program CRExplorer (see www.crexplorer.net) was specifically developed by Thor, Marx, Leydesdorff, and Bornmann (2016a, 2016b) for appl...
Dieser Beitrag präsentiert DIAL (Distributed InterActive Lecture), ein BigBlueButton-basiertes System für interaktive Live-Übertragungen von Vorlesungen. DIAL erweitert das Konferenzsystem BigBlueButton derart, dass Studierenden Zugang zum Chat- und Umfragesystem aus dem Stand-By ihres Endgerätes heraus ermöglicht wird, d.h. ohne die Notwendigkeit...
In diesem Beitrag werden mit dem E-Assessment-Literacy-Tool EAs.LiT und dem Peer-Assessment-Tool PAssT! zwei plattformunabhängige Werkzeuge präsentiert, die hochschuldidaktisch bereits bekannte Verfahren workflowbasiert abbilden und zur Etablierung hochschulübergreifender Qualitätsstandards im Bereich E-Assessment beitragen sollen. EAs.LiT unterstü...
Die Formulierung von Learning Outcomes und deren Transparenz gegenüber Studierenden ist Grundlage für eigenverantwortliche Lernprozesse und kompetenzorientierte Prüfungen (Constructive Alignment). Das in diesem Beitrag präsentierte E-Assessment-Literacy-Tool (EAs.LiT) unterstützt hochschuldidaktisch fundiert bei der Formulierung von Learning Outcom...
The program HistCite™ enables an analyst to identify significant works on a given topic using the citation links between them diachronically. However, using Scopus data for drawing historiograms with HistCite™ has hitherto been a problem. In the new version of the program CRExplorer, one can translate citation data from Scopus to WoS formats (or vi...
This bibliometric analysis focuses on the general history of climate change research and, more specifically, on the discovery of the greenhouse effect. First, the Reference Publication Year Spectroscopy (RPYS) is applied to a large publication set on climate change of 222,060 papers published between 1980 and 2014. The references cited therein were...
This bibliometric analysis focuses on the general history of climate change research and, more specifically, on the discovery of the greenhouse effect. First, the Reference Publication Year Spectroscopy (RPYS) is applied to a large publication set on climate change of 222,060 papers published between 1980 and 2014. The references cited therein were...
CRExplorer version 1.6.7 was released on July 5, 2016. This version includes the following new features and improvements: Scopus: Using "File" - "Import" - "Scopus", CRExplorer reads files from Scopus. The file format "CSV" (including citations, abstracts and references) should be chosen in Scopus for downloading records. Export facilities: Using "...
CRExplorer version 1.6.7 was released on July 5, 2016. This version includes the following new features and improvements: Scopus: Using "File" - "Import" - "Scopus", CRExplorer reads files from Scopus. The file format "CSV" (including citations, abstracts and references) should be chosen in Scopus for downloading records. Export facilities: Using "...
Referenced Publication Year Spectroscopy (RPYS) was recently introduced as a method to analyze the historical roots of research fields and groups or institutions. RPYS maps the distribution of the publication years of the cited references in a document set. In this study, we apply this methodology to the {\oe}uvre of an individual researcher on the...
We introduce a new tool - the CitedReferencesExplorer (CRExplorer,
www.crexplorer.net) - which can be used to disambiguate and analyze the cited
references (CRs) of a publication set downloaded from the Web of Science (WoS).
The tool is especially suitable to identify those publications which have been
frequently cited by the researchers in a field...
Reference Publication Year Spectroscopy (RPYS) was proposed by Marx, Bornmann, Barth, and Leydesdorff (2014, [18]) to identify seminal publications in a research field which are most important in a historical context. We refined our RPYS toolbox by adding some features to the existing programs and we developed two new routines. First, a direct comp...
In the humanities and social sciences, bibliometric methods for the assessment of research performance are (so far) less common. The current study takes a concrete example in an attempt to evaluate a research institute from the area of social sciences and humanities with the help of data from Google Scholar (GS). In order to use GS for a bibliometr...
We demonstrate AutoShard, a ready-to-use object mapper for Java applications running against NoSQL data stores. AutoShard's unique feature is its capability to
gracefully shard hot spot data objects that are suffering under concurrent writes. By sharding data on the level of the logical schema, scalability bottlenecks due to write contention can be...
Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building blo...
Ein eigenes Themenheft zum Datenmanagement in der Cloud dient uns als Anlass, die Präsenz von Cloud-Themen in der akademischen Datenbanklehre zu erfassen. In diesem Artikel geben wir die Ergebnisse einer Umfrage innerhalb der Fachgruppe Datenbanksysteme durch den Arbeitskreis Datenmanagement in der Cloud wieder. Dozentinnen und Dozenten von über zw...
Schwerpunktthema: Datenmanagement in der Cloud
Linked Open Data initiatives have made available a diversity of collections that domain experts have annotated with controlled vocabulary terms from ontologies. We identify annotation signatures of linked data that associate semantically similar concepts, where similarity is measured in terms of shared annotations and ontological relatedness. Forma...
NoSQL document stores are becoming increasingly popular backends in Web development.
Not only do they scale out to large volumes of data, many systems are even
custom-tailored for this domain: NoSQL document stores like Google Cloud Datastore
have been designed to support massively parallel reads,
and even guarantee strong consistency in updating s...
Linked Open Data initiatives have made available a diversity of collections that domain experts have annotated with controlled vocabulary terms from ontologies. The challenge is to explore these rich and complex annotated datasets, together with the domain semantics captured within ontologies, to discover patterns of annotations across multiple con...
Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode scientific knowledge which is captured in annotation datasets. One can mine these datasets to discover relationships and pat...
To improve the effectiveness of pair-wise similarity computation, state-of-the-art approaches assign objects to multiple overlapping clusters. This introduces redundant pair comparisons when similar objects share more than one cluster. We propose an approach that eliminates such redundant comparisons and that can be easily integrated into existing...
Linked Open Data initiatives have made available a diversity of collections that domain experts have annotated with controlled vocabulary terms from ontologies. The challenge is to explore these rich and complex annotated datasets, together with the domain semantics captured within ontologies, to discover patterns of annotations across multiple con...
Annotations of clinical trials with controlled vocabularies of drugs and diseases, encode scientific knowledge that can be mined to discover relationships between scientific concepts. We present PAnG (Patterns in Annotation Graphs), a tool that relies on dense subgraphs, graph summarization and taxonomic distance metrics, computed using the NCI The...
Mappings between related ontologies are increasingly used to support data integration and analysis tasks. Changes in the ontolo-gies also require the adaptation of ontology mappings. So far the evolu-tion of ontology mappings has received little attention albeit ontologies change continuously especially in the life sciences. We therefore analyze ho...
We demonstrate a powerful and easy-to-use tool called Dedoop ( De duplication with Ha doop ) for MapReduce-based entity resolution (ER) of large datasets. Dedoop supports a browser-based specification of complex ER workflows including blocking and matching steps as well as the optional use of machine learning for the automatic generation of match c...
We demonstrate a new powerful mashup tool called WETSUIT (Web EnTity Search and fUsIon Tool) to search and integrate web data from diverse sources and domain-specific entity search engines. WETSUIT supports adaptive search strategies to query sets of relevant entities with a minimum of communication overhead. Mashups can be composed using a set of...
Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where concepts such as genes and proteins are annotated with controlled vocabulary terms from ontologies. Scientists are interested in analyzing or mining these annotations, in synergy with the literature, to discover patterns. Furth...
Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences and health sciences, where concepts such as genes, proteins or clinical trials are annotated with controlled vocabulary terms from ontologies. We present a tool, PAnG (Patterns in Annotation Graphs), that is based on a complementary...
Product matching is a challenging variation of entity resolution to identify representations and offers referring to the same product. Product matching is highly difficult due to the broad spectrum of products, many similar but different products, frequently missing or wrong values, and the textual nature of product titles and descriptions. We prop...
Mappings between related ontologies are increasingly used to support data
integration and analysis tasks. Changes in the ontologies also require the
adaptation of ontology mappings. So far the evolution of ontology mappings has
received little attention albeit ontologies change continuously especially in
the life sciences. We therefore analyze how...
Programmatic data integration approaches such as mashups have become a viable approach to dynamically integrate web data at runtime. Key data sources for mashups include entity search engines and hidden databases that need to be queried via source-specific search interfaces or web forms. Current mashups are typically restricted to simple query appr...
Entity resolution is a crucial step for data quality and data integration. Learning-based approaches show high effective-ness at the expense of poor efficiency. To reduce the typ-ically high execution times, we investigate how learning-based entity resolution can be realized in a cloud infras-tructure using MapReduce. We propose and evaluate two ef...
The effectiveness and scalability of MapReduce-based implementations of complex data-intensive tasks depend on an even redistribution of data between map and reduce tasks. In the presence of skewed data, sophisticated redistribution approaches thus become necessary to achieve load balancing among all reduce tasks to be executed in parallel. For the...
Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies are playing a leading role in making such datasets widely ava...
The effectiveness and scalability of MapReduce-based implementations of
complex data-intensive tasks depend on an even redistribution of data between
map and reduce tasks. In the presence of skewed data, sophisticated
redistribution approaches thus become necessary to achieve load balancing among
all reduce tasks to be executed in parallel. For the...
The advent of cloud computing technologies shows great promise for web engineering and facilitates the development of flexible, distributed, and scalable web applications. Data integration can notably benefit from cloud computing because integrating web data is usually an expensive task. This paper introduces CloudFuice, a data integration system t...
Purpose
– The single publication h index has been introduced by Schubert as the h index calculated from the list of citing publications of one single publication. This paper aims to look at the calculation of the single publication h index and related performance measures.
Design/methodology/approach
– In this paper a web application is presented...
Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets.
We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity resolution using
Sorting Neighborhood blocking (SN). We propose and evaluate two efficient MapReduce-based im...
Product matching aims at identifying different product offers referring to the same real-world product. Product offers are provided by different merchants and describe products using textual attributes such as offer title and description. String similarity measures therefore play an important role for matching corresponding product offers. In this...
Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity resolution. In particular, we propose and evaluate two MapReduce-based implementations for Sorted Neighborhoo...
Despite the huge amount of recent research efforts on entity resolution (matching) there has not yet been a comparative evaluation on the relative effectiveness and efficiency of alternate approaches. We therefore present such an evaluation of existing implementations on challenging real-world match tasks. We consider approaches both with and witho...
Entity matching is a key task for data integration and especially challenging for Web data. Effective entity matching typically requires combining several match techniques and finding suitable configuration parameters, such as similarity thresholds. The authors investigate to what degree machine learning helps semi-automatically determine suitable...
More than 4,500 open access (OA) journals have now become established in science. But doubts exist about the quality of the
manuscript selection process for publication in these journals. In this study we investigate the quality of the selection
process of an OA journal, taking as an example the journal Atmospheric Chemistry and Physics (ACP). ACP...
Dynamic web applications such as mashups need efficient access to web data that is only accessible via entity search engines (e.g. product or publication search engines). However, most current mashup systems and applications only support simple keyword searches for retrieving data from search engines. We propose the use of more powerful search stra...
We present FEVER, a new evaluation platform for entity resolu- tion approaches. The modular structure of the FEVER framework supports the incorporation or reconstruction of many previously proposed approaches for entity resolution. A distinctive feature of FEVER is that it not only evaluates traditional measures such as precision and recall but als...
Examining a comprehensive set of papers (n = 1837) that were accepted for publication by the journal Angewandte Chemie International Edition (one of the prime chemistry journals in the world) or rejected by the journal but then published elsewhere, this study tested the extent to which the use of the freely available database Google Scholar (GS) ca...
Ontology matching has been widely studied. However, the resulting on-tology mappings can be rather unstable when the participating ontologies or util-ized secondary sources (e.g., instance sources, thesauri) evolve. We propose an evolution-based approach for assessing ontology mappings by annotating their cor-respondences by information about simil...
Mashups exemplify a workflow-like approach to dynamically integrate data and services from multiple web sources. Such integration workflows can build on existing services for web search, entity search, database querying, and information extraction and thus complement other data integration approaches. A key challenge is the efficient execution of i...