Andreas Thor

Andreas Thor
Leipzig University of Applied Sciences | HTWK

Professor

About

82
Publications
27,923
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,288
Citations
Additional affiliations
April 2013 - present
Hochschule für Telekommunikation Leipzig
Position
  • Professor
April 2012 - March 2013
University of Passau
Position
  • Professor
January 2010 - April 2011
University of Maryland, College Park
Position
  • Visting Researcher

Publications

Publications (82)
Article
Full-text available
In this article, we explore the transformative impact of advanced, parameter-rich Large Language Models (LLMs) on the production of instructional materials in higher education, with a focus on the automated generation of both formative and summative assessments for learners in the field of mathematics. We introduce a novel LLM-driven process and ap...
Preprint
NoSQL document stores are becoming increasingly popular as backends in web development. Not only do they scale out to large volumes of data, many systems are even custom-tailored for this domain: NoSQL document stores like Google Cloud Datastore have been designed to support massively parallel reads, and even guarantee strong consistency in updatin...
Preprint
Full-text available
EAs.LiT is an e-assessment management and analysis software for which contextual requirements and usage scenarios changed over time. Based on these factors and further development activities, the decision was made to adopt a microservice architecture for EAs.LiT version 2 in order to increase its flexibility to adapt to new and changed circumstance...
Article
Full-text available
Zusammenfassung Die Bearbeitung von Übungsaufgaben ist ein wichtiges Element in der Datenbank-Lehre. Lösungen der Studierenden lassen sich dabei häufig in strukturierten Ergebnisformaten festhalten, wie z. B. SQL-Anfragen oder die Spezifikation von Schemata und Relationen. Dieser Beitrag stellt das E‑Assessment-Tool DMT (Data Management Tester) vor...
Chapter
The classification of e-assessment items with levels of Bloom’s taxonomy is an important aspect of effective e-assessment. Such annotations enable the automatic generation of parallel tests with the same competence profile as well as a competence-oriented analysis of the students’ exam results. Unfortunately, manual annotation by item creators is r...
Conference Paper
Full-text available
Elektronische Prüfungen (E-Assessments) mit geschlossenen Aufgabenformaten sind in teilnehmerstarken Prüfungssituationen ein besonders effizientes Testverfahren (Michel et al., 2015; Pengel et al., 2019). Gleichzeitig entstehen mit ihrem Einsatz hohe Aufwände bei der Verwaltung der notwendigen Aufgabensammlungen (Itempools). Die zumeist dafür einge...
Book
Full-text available
Digitisation has established itself as the change maker par excellence in business, science and society. Infrastructures, working methods and skills are at the forefront of many debates and increasingly determine the future viability of entire industries. We have obviously embraced the permanent change with increasing acceleration. But: Where is th...
Article
What are the landmark papers in scientific disciplines? Which papers are indispensable for scientific progress? These are typical questions which are of interest not only for researchers (who frequently know the answers – or guess to know them) but also for the interested general public. Citation counts can be used to identify very useful papers si...
Chapter
The temporal analysis of evolving graphs is an important requirement in many domains. We are therefore extending the distributed graph analysis framework Gradoop and its graph data model to support temporal graph analysis. This paper contains an overview of our work in progress and an example use case from the financial domain demonstrating the fle...
Chapter
In der Diskussion über die Digitalisierung der Forschung spielt die Frage nach der optimalen IT-Unterstützung für Forschende eine wichtige Rolle. Forschende können heute an ihren Hochschulen bzw. Wissenschaftseinrichtungen auf ein breites Angebot interner IT-Dienstleistungen zurückgreifen, das auch kooperative IT-Dienste umfasst, die von mehreren I...
Article
The temporal analysis of evolving graphs is an important requirement in many domains but hardly supported in current graph database and graph processing systems. We therefore have started with extending the distributed graph analysis framework Gradoop for temporal graph analysis by adding time properties to vertices, edges and graphs and using them...
Article
Full-text available
Since the introduction of the reference publication year spectroscopy (RPYS) method and the corresponding programme CRExplorer, many studies have been published revealing the historical roots of topics, fields and researchers. The application of the method was restricted up to now by the available memory of the computer used for running the CRExplo...
Preprint
Full-text available
What are the landmark papers in scientific disciplines? On whose shoulders does research in these fields stand? Which papers are indispensable for scientific progress? These are typical questions which are not only of interest for researchers (who frequently know the answers - or guess to know them), but also for the interested general public. Cita...
Preprint
Full-text available
Since the introduction of the reference publication year spectroscopy (RPYS) method and the corresponding program CRExplorer, many studies have been published revealing the historical roots of topics, fields, and researchers. The application of the method was restricted up to now by the available memory of the computer used for running the CRExplor...
Article
Full-text available
Reference Publication Year Spectroscopy (RPYS) has been developed for identifying the cited references (CRs) with the greatest influence in a given paper set (mostly sets of papers on certain topics or fields). The program CRExplorer (see www.crexplorer.net) was specifically developed by Thor, Marx, Leydesdorff, and Bornmann (2016a, 2016b) for appl...
Preprint
Reference Publication Year Spectroscopy (RPYS) has been developed for identifying the cited references (CRs) with the greatest influence in a given paper set (mostly sets of papers on certain topics or fields). The program CRExplorer (see www.crexplorer.net) was specifically developed by Thor, Marx, Leydesdorff, and Bornmann (2016a, 2016b) for appl...
Article
Full-text available
Dieser Beitrag präsentiert DIAL (Distributed InterActive Lecture), ein BigBlueButton-basiertes System für interaktive Live-Übertragungen von Vorlesungen. DIAL erweitert das Konferenzsystem BigBlueButton derart, dass Studierenden Zugang zum Chat- und Umfragesystem aus dem Stand-By ihres Endgerätes heraus ermöglicht wird, d.h. ohne die Notwendigkeit...
Article
Full-text available
In diesem Beitrag werden mit dem E-Assessment-Literacy-Tool EAs.LiT und dem Peer-Assessment-Tool PAssT! zwei plattformunabhängige Werkzeuge präsentiert, die hochschuldidaktisch bereits bekannte Verfahren workflowbasiert abbilden und zur Etablierung hochschulübergreifender Qualitätsstandards im Bereich E-Assessment beitragen sollen. EAs.LiT unterstü...
Article
Full-text available
Die Formulierung von Learning Outcomes und deren Transparenz gegenüber Studierenden ist Grundlage für eigenverantwortliche Lernprozesse und kompetenzorientierte Prüfungen (Constructive Alignment). Das in diesem Beitrag präsentierte E-Assessment-Literacy-Tool (EAs.LiT) unterstützt hochschuldidaktisch fundiert bei der Formulierung von Learning Outcom...
Article
Full-text available
The program HistCite™ enables an analyst to identify significant works on a given topic using the citation links between them diachronically. However, using Scopus data for drawing historiograms with HistCite™ has hitherto been a problem. In the new version of the program CRExplorer, one can translate citation data from Scopus to WoS formats (or vi...
Article
Full-text available
This bibliometric analysis focuses on the general history of climate change research and, more specifically, on the discovery of the greenhouse effect. First, the Reference Publication Year Spectroscopy (RPYS) is applied to a large publication set on climate change of 222,060 papers published between 1980 and 2014. The references cited therein were...
Preprint
This bibliometric analysis focuses on the general history of climate change research and, more specifically, on the discovery of the greenhouse effect. First, the Reference Publication Year Spectroscopy (RPYS) is applied to a large publication set on climate change of 222,060 papers published between 1980 and 2014. The references cited therein were...
Article
CRExplorer version 1.6.7 was released on July 5, 2016. This version includes the following new features and improvements: Scopus: Using "File" - "Import" - "Scopus", CRExplorer reads files from Scopus. The file format "CSV" (including citations, abstracts and references) should be chosen in Scopus for downloading records. Export facilities: Using "...
Preprint
CRExplorer version 1.6.7 was released on July 5, 2016. This version includes the following new features and improvements: Scopus: Using "File" - "Import" - "Scopus", CRExplorer reads files from Scopus. The file format "CSV" (including citations, abstracts and references) should be chosen in Scopus for downloading records. Export facilities: Using "...
Article
Full-text available
Referenced Publication Year Spectroscopy (RPYS) was recently introduced as a method to analyze the historical roots of research fields and groups or institutions. RPYS maps the distribution of the publication years of the cited references in a document set. In this study, we apply this methodology to the {\oe}uvre of an individual researcher on the...
Article
We introduce a new tool - the CitedReferencesExplorer (CRExplorer, www.crexplorer.net) - which can be used to disambiguate and analyze the cited references (CRs) of a publication set downloaded from the Web of Science (WoS). The tool is especially suitable to identify those publications which have been frequently cited by the researchers in a field...
Article
Reference Publication Year Spectroscopy (RPYS) was proposed by Marx, Bornmann, Barth, and Leydesdorff (2014, [18]) to identify seminal publications in a research field which are most important in a historical context. We refined our RPYS toolbox by adding some features to the existing programs and we developed two new routines. First, a direct comp...
Article
In the humanities and social sciences, bibliometric methods for the assessment of research performance are (so far) less common. The current study takes a concrete example in an attempt to evaluate a research institute from the area of social sciences and humanities with the help of data from Google Scholar (GS). In order to use GS for a bibliometr...
Conference Paper
Full-text available
We demonstrate AutoShard, a ready-to-use object mapper for Java applications running against NoSQL data stores. AutoShard's unique feature is its capability to gracefully shard hot spot data objects that are suffering under concurrent writes. By sharding data on the level of the logical schema, scalability bottlenecks due to write contention can be...
Article
Full-text available
Linked Open Data initiatives have made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms from ontologies. Annotations encode scientific knowledge, which is captured in annotation datasets. Determining relatedness between annotated entities becomes a building blo...
Article
Full-text available
Ein eigenes Themenheft zum Datenmanagement in der Cloud dient uns als Anlass, die Präsenz von Cloud-Themen in der akademischen Datenbanklehre zu erfassen. In diesem Artikel geben wir die Ergebnisse einer Umfrage innerhalb der Fachgruppe Datenbanksysteme durch den Arbeitskreis Datenmanagement in der Cloud wieder. Dozentinnen und Dozenten von über zw...
Article
Schwerpunktthema: Datenmanagement in der Cloud
Conference Paper
Linked Open Data initiatives have made available a diversity of collections that domain experts have annotated with controlled vocabulary terms from ontologies. We identify annotation signatures of linked data that associate semantically similar concepts, where similarity is measured in terms of shared annotations and ontological relatedness. Forma...
Conference Paper
Full-text available
NoSQL document stores are becoming increasingly popular backends in Web development. Not only do they scale out to large volumes of data, many systems are even custom-tailored for this domain: NoSQL document stores like Google Cloud Datastore have been designed to support massively parallel reads, and even guarantee strong consistency in updating s...
Conference Paper
Linked Open Data initiatives have made available a diversity of collections that domain experts have annotated with controlled vocabulary terms from ontologies. The challenge is to explore these rich and complex annotated datasets, together with the domain semantics captured within ontologies, to discover patterns of annotations across multiple con...
Conference Paper
Full-text available
Linked Open Data has made available a diversity of scientific collections where scientists have annotated entities in the datasets with controlled vocabulary terms (CV terms) from ontologies. These semantic annotations encode scientific knowledge which is captured in annotation datasets. One can mine these datasets to discover relationships and pat...
Conference Paper
Full-text available
To improve the effectiveness of pair-wise similarity computation, state-of-the-art approaches assign objects to multiple overlapping clusters. This introduces redundant pair comparisons when similar objects share more than one cluster. We propose an approach that eliminates such redundant comparisons and that can be easily integrated into existing...
Article
Linked Open Data initiatives have made available a diversity of collections that domain experts have annotated with controlled vocabulary terms from ontologies. The challenge is to explore these rich and complex annotated datasets, together with the domain semantics captured within ontologies, to discover patterns of annotations across multiple con...
Conference Paper
Full-text available
Annotations of clinical trials with controlled vocabularies of drugs and diseases, encode scientific knowledge that can be mined to discover relationships between scientific concepts. We present PAnG (Patterns in Annotation Graphs), a tool that relies on dense subgraphs, graph summarization and taxonomic distance metrics, computed using the NCI The...
Conference Paper
Full-text available
Mappings between related ontologies are increasingly used to support data integration and analysis tasks. Changes in the ontolo-gies also require the adaptation of ontology mappings. So far the evolu-tion of ontology mappings has received little attention albeit ontologies change continuously especially in the life sciences. We therefore analyze ho...
Article
Full-text available
We demonstrate a powerful and easy-to-use tool called Dedoop ( De duplication with Ha doop ) for MapReduce-based entity resolution (ER) of large datasets. Dedoop supports a browser-based specification of complex ER workflows including blocking and matching steps as well as the optional use of machine learning for the automatic generation of match c...
Article
Full-text available
We demonstrate a new powerful mashup tool called WETSUIT (Web EnTity Search and fUsIon Tool) to search and integrate web data from diverse sources and domain-specific entity search engines. WETSUIT supports adaptive search strategies to query sets of relevant entities with a minimum of communication overhead. Mashups can be composed using a set of...
Conference Paper
Full-text available
Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where concepts such as genes and proteins are annotated with controlled vocabulary terms from ontologies. Scientists are interested in analyzing or mining these annotations, in synergy with the literature, to discover patterns. Furth...
Article
Full-text available
Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences and health sciences, where concepts such as genes, proteins or clinical trials are annotated with controlled vocabulary terms from ontologies. We present a tool, PAnG (Patterns in Annotation Graphs), that is based on a complementary...
Article
Full-text available
Product matching is a challenging variation of entity resolution to identify representations and offers referring to the same product. Product matching is highly difficult due to the broad spectrum of products, many similar but different products, frequently missing or wrong values, and the textual nature of product titles and descriptions. We prop...
Article
Full-text available
Mappings between related ontologies are increasingly used to support data integration and analysis tasks. Changes in the ontologies also require the adaptation of ontology mappings. So far the evolution of ontology mappings has received little attention albeit ontologies change continuously especially in the life sciences. We therefore analyze how...
Article
Full-text available
Programmatic data integration approaches such as mashups have become a viable approach to dynamically integrate web data at runtime. Key data sources for mashups include entity search engines and hidden databases that need to be queried via source-specific search interfaces or web forms. Current mashups are typically restricted to simple query appr...
Article
Full-text available
Entity resolution is a crucial step for data quality and data integration. Learning-based approaches show high effective-ness at the expense of poor efficiency. To reduce the typ-ically high execution times, we investigate how learning-based entity resolution can be realized in a cloud infras-tructure using MapReduce. We propose and evaluate two ef...
Conference Paper
Full-text available
The effectiveness and scalability of MapReduce-based implementations of complex data-intensive tasks depend on an even redistribution of data between map and reduce tasks. In the presence of skewed data, sophisticated redistribution approaches thus become necessary to achieve load balancing among all reduce tasks to be executed in parallel. For the...
Conference Paper
Full-text available
Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies are playing a leading role in making such datasets widely ava...
Article
Full-text available
The effectiveness and scalability of MapReduce-based implementations of complex data-intensive tasks depend on an even redistribution of data between map and reduce tasks. In the presence of skewed data, sophisticated redistribution approaches thus become necessary to achieve load balancing among all reduce tasks to be executed in parallel. For the...
Conference Paper
Full-text available
The advent of cloud computing technologies shows great promise for web engineering and facilitates the development of flexible, distributed, and scalable web applications. Data integration can notably benefit from cloud computing because integrating web data is usually an expensive task. This paper introduces CloudFuice, a data integration system t...
Article
Full-text available
Purpose – The single publication h index has been introduced by Schubert as the h index calculated from the list of citing publications of one single publication. This paper aims to look at the calculation of the single publication h index and related performance measures. Design/methodology/approach – In this paper a web application is presented...
Article
Full-text available
Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity resolution using Sorting Neighborhood blocking (SN). We propose and evaluate two efficient MapReduce-based im...
Conference Paper
Full-text available
Product matching aims at identifying different product offers referring to the same real-world product. Product offers are provided by different merchants and describe products using textual attributes such as offer title and description. String similarity measures therefore play an important role for matching corresponding product offers. In this...
Article
Full-text available
Cloud infrastructures enable the efficient parallel execution of data-intensive tasks such as entity resolution on large datasets. We investigate challenges and possible solutions of using the MapReduce programming model for parallel entity resolution. In particular, we propose and evaluate two MapReduce-based implementations for Sorted Neighborhoo...
Article
Full-text available
Despite the huge amount of recent research efforts on entity resolution (matching) there has not yet been a comparative evaluation on the relative effectiveness and efficiency of alternate approaches. We therefore present such an evaluation of existing implementations on challenging real-world match tasks. We consider approaches both with and witho...
Article
Full-text available
Entity matching is a key task for data integration and especially challenging for Web data. Effective entity matching typically requires combining several match techniques and finding suitable configuration parameters, such as similarity thresholds. The authors investigate to what degree machine learning helps semi-automatically determine suitable...
Article
Full-text available
More than 4,500 open access (OA) journals have now become established in science. But doubts exist about the quality of the manuscript selection process for publication in these journals. In this study we investigate the quality of the selection process of an OA journal, taking as an example the journal Atmospheric Chemistry and Physics (ACP). ACP...
Article
Full-text available
Dynamic web applications such as mashups need efficient access to web data that is only accessible via entity search engines (e.g. product or publication search engines). However, most current mashup systems and applications only support simple keyword searches for retrieving data from search engines. We propose the use of more powerful search stra...
Article
Full-text available
We present FEVER, a new evaluation platform for entity resolu- tion approaches. The modular structure of the FEVER framework supports the incorporation or reconstruction of many previously proposed approaches for entity resolution. A distinctive feature of FEVER is that it not only evaluates traditional measures such as precision and recall but als...
Article
Full-text available
Examining a comprehensive set of papers (n = 1837) that were accepted for publication by the journal Angewandte Chemie International Edition (one of the prime chemistry journals in the world) or rejected by the journal but then published elsewhere, this study tested the extent to which the use of the freely available database Google Scholar (GS) ca...
Conference Paper
Full-text available
Ontology matching has been widely studied. However, the resulting on-tology mappings can be rather unstable when the participating ontologies or util-ized secondary sources (e.g., instance sources, thesauri) evolve. We propose an evolution-based approach for assessing ontology mappings by annotating their cor-respondences by information about simil...
Conference Paper
Full-text available
Mashups exemplify a workflow-like approach to dynamically integrate data and services from multiple web sources. Such integration workflows can build on existing services for web search, entity search, database querying, and information extraction and thus complement other data integration approaches. A key challenge is the efficient execution of i...