About
127
Publications
16,273
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,510
Citations
Introduction
Dr. Thomas Risse is head of Electronic Services at the University Library Johann Christian Senckenberg of the Goethe University Frankfurt. He is responsible for the strategic development and operation of the electronic services offered by the library. He was involved in several national and European projects in the area of digital libraries and Web archives. Thomas Risse's research interests are Semantic Evolution, Digital Libraries, Web Archiving, Data Management in Distributed Systems and Self-organizing Systems. He serves regularly as program committee member or project reviewer. He published several papers at the relevant international conferences.
Additional affiliations
January 2007 - present
Publications
Publications (127)
This volume presents a special issue on selected papers from the 2019 & 2020 editions of the International Conference on Theory and Practice of Digital Libraries (TPDL). They cover different research areas within Digital Libraries, from Ontology and Linked Data to quality in Web Archives and Topic Detection. We first provide a brief overview of bot...
Web archives are an essential information source for research on historical events. However, the large scale and heterogeneity of web archives make it difficult for researchers to access relevant event-specific materials. In this chapter, we discuss methods for creating event-centric collections from large-scale web archives. These methods are mani...
This book provides practical information about web archives, offers inspiring examples for web archivists, raises new challenges, and shares recent research results about access methods to explore information from the past preserved by web archives.
The book is structured in six parts. Part 1 advocates for the importance of web archives to preserve...
This book constitutes thoroughly reviewed and selected papers presented at Workshops and Doctoral Consortium of the 24th East-European Conference on Advances in Databases and Information Systems, ADBIS 2020, the 24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, and the 16th Workshop on Business Intelligence and B...
Web archives constitute an increasingly important source of information for computer scientists, humanities researchers and journalists interested in studying past events. However, currently there are no access methods that help Web archive users to efficiently access event-centric information in large-scale archives that go beyond the retrieval of...
This book constitutes the proceedings of the 24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, held in Lyon, France, in August 2020.*
The 14 full papers and 4 short papers presented were carefully reviewed and selected from 53 submissions. TPDL 2020 attempts to facilitate establishing connections and convergences...
Due to technical constrains of the past, metadata in languages written with non-Latin scripts have frequently been entered using various systems of transcription. While this transcription is essential for data curators who may not be familiar with the source script, it is often an encumbrance for researchers in discovery and retrieval. Until 2011 t...
BIOfid is a specialized information service currently being developed to mobilize biodiversity data dormant in printed historical and modern literature and to offer a platform for open access journals on the science of biodiversity. Our team of librarians, computer scientists and biologists produce high-quality text digitizations, develop new text-...
Web archives are typically very broad in scope and extremely large in scale. This makes data analysis appear daunting, especially for non-computer scientists. These collections constitute an increasingly important source for researchers in the social sciences, the historical sciences and journalists interested in studying past events. However, ther...
With advances in technology and culture, our language changes. We invent new words, add or change meanings of existing words and change names of existing things. Unfortunately, our language does not carry a memory; words, expressions and meanings used in the past are forgotten over time. When searching and interpreting content from archives, langua...
Web archives are typically very broad in scope and extremely large in scale. This makes data analysis appear daunting, especially for non-computer scientists. These collections constitute an increasingly important source for researchers in the social sciences, the historical sciences and journalists interested in studying past events. However, ther...
The evolution of named entities affects exploration and retrieval tasks in digital libraries. An information retrieval system that is aware of name changes can actively support users in finding former occurrences of evolved entities. However, current structured knowledge bases, such as DBpedia or Freebase, do not provide enough information about ev...
Advancements in technology and culture lead to changes in our language. These changes create a gap between the language known by users and the language stored in digital archives. It affects user's possibility to firstly find content and secondly interpret that content. In previous work we introduced our approach for Named Entity Evolution Recognit...
Accessing Web archives raises a number of issues caused by their temporal characteristics. Additional knowledge is needed to find and understand older texts. Especially entities mentioned in texts are subject to change. Most severe in terms of information retrieval are name changes. In order to find entities that have changed their name over time,...
Working with Web archives raises a number of issues caused by their temporal characteristics. Depending on the age of the content, additional knowledge might be needed to find and understand older texts. Especially facts about entities are subject to change. Most severe in terms of information retrieval are name changes. In order to find entities t...
Long-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to...
Long-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to...
Collections of Web documents about specific topics are needed for many areas of current research. Focused crawling enables the creation of such collections on demand. Current focused crawlers require the user to manually specify starting points for the crawl (seed URLs). These are also used to describe the expected topic of the collection. The choi...
Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working w...
Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working w...
Long-term Web archives comprise Web documents gathered
over longer time periods and can easily reach hundreds of terabytes in
size. Semantic annotations such as named entities can facilitate intelligent
access to the Web archive data. However, the annotation of the
entire archive content on this scale is often infeasible. The most efficient
way to...
The World Wide Web is well established as a global information and communication medium. New technologies regularly come along which expand the forms of use and permit even inexperienced users to publish content or take part in discussions. For this reason the Web can also be seen as a good documenter of present-day society. The dynamism of the Web...
Researchers in the Digital Humanities and journalists need to monitor, collect and analyze fresh online content regarding current events such as the Ebola outbreak or the Ukraine crisis on demand. However, existing focused crawling approaches only consider topical aspects while ignoring temporal aspects and therefore cannot achieve thematically coh...
The World Wide Web is well established as a global information and communication medium. New technologies regularly come along which expand the forms of use and permit even inexperienced users to publish content or take part in discussions. For this reason the Web can also be seen as a good documenter of present-day society. The dynamism of the Web...
The concept of culturomics was born out of the availability of massive amounts of textual data and the interest to make sense of cultural and language phenomena over time. Thus far however, culturomics has only made use of, and shown the great potential of, statistical methods. In this paper, we present a vision for a knowledge-based culturomics th...
Preservation and curation of digital materials is a significant contemporary cultural, economic and social issue, yet it is often neglected. For decades, the amount of content created digitally has grown dramatically and more recently exponentially as digital process have become core aspects of everyday business and life. Indeed, the complete life...
Collections of Web documents about specific topics are needed for many areas of current research. Focused crawling enables the creation of such collections on demand. Current focused crawlers require the user to manually specify starting points for the crawl (seed URLs). These are also used to describe the expected topic of the collection. The choi...
The constantly growing amount ofWeb content and the success of the SocialWeb lead to increasing needs for Web archiving. These needs go beyond the pure preservationo of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions and other events....
Working with Web archives raises a number of issues caused by their temporal characteristics. Depending on the age of the content, additional knowledge might be needed to find and understand older texts. Especially facts about entities are subject to change. Most severe in terms of information retrieval are name changes. In order to find entities t...
The web and the social web play an increasingly important role as an information source for Members of Parliament and their assistants, journalists, political analysts and researchers. It provides important and crucial background information, like reactions to political events and comments made by the general public. The case study presented in thi...
Many complex maneuvers involving aircraft, vehicles and persons are carried out at airport aprons. Manual video surveillance used for safety and security purposes is inefficient and privacy protection must be guaranteed. In this paper, we propose a system named ASEV that automatically assesses situations for airport surveillance. It combines four m...
The World Wide Web is the largest information repository available today. However, this information is very volatile and Web archiving is essential to preserve it for the future. Existing approaches to Web archiving are based on simple definitions of the scope of Web pages to crawl and are limited to basic interactions with Web servers. The aim of...
Accessing Web archives raises a number of issues caused by their temporal characteristics. Additional knowledge is needed to find and understand older texts. Especially entities mentioned in texts are subject to change. Most severe in terms of information retrieval are name changes. In order to find entities that have changed their name over time,...
Advancements in technology and culture lead to changes in our language. These changes create a gap between the language known by users and the language stored in digital archives. It affects user’s possibility to firstly find content and secondly interpret that content. In a previous work, we introduced our approach for named entity evolution recog...
Today an increasing interest in collecting, analyzing and preserving Web content can be observed in many digital humanities. Especially the Social Web is attractive for many humanities disciplines as it provides a direct access to statements of many people about politics, popular topics or events. In this paper we present an exemplary study that we...
The microblogging service Twitter has become one of the most popular sources of real time information. Every second, hundreds of URLs are posted on Twitter. Due to the maximum tweet length of 140 characters, these URLs are in most cases a shortened version of the original URLs. In contrast to the original URLS, which usually provide some hints on t...
The SMS workshop 2013 on Social Media Semantics was held this year in the context of the OTM (”OnTheMove”) federated conferences, covering different aspects of distributed information systems in September 2013 in Graz.
The topic of the workshop is about semantics in Social Media. The SocialWeb has become the first and main medium to get and spread...
Semantic ambient media are the novel trend in the world of media reaching from the pioneering subareas such as ambient advertising to the new and emerging subareas such as ambient assisted living. They will likely shape the upcoming years in terms of modeling smart environments and also media consumption and interaction. This work analyzes semantic...
As language evolves over time, documents stored in long- term archives become inaccessible to users. Automatically, detecting and handling language evolution will become a necessity to meet user’s information needs. In this paper, we investigate the performance of modern tools and algorithms applied on modern English to find word senses that will l...
Knowing about the evolution of a term can significantly help when searching for relevant information, especially in case of sudden evolutions (e.g. as of dramatical changes in political situations). Here, some terms get a completely new meaning or are used in new or different ways. In mobile situations it is important to be able to effectively retr...
Semantic ambient media are the novel trend in the world of media reaching from the pioneering subareas such as ambient advertising to the new and emerging subareas such as ambient assisted living. They will likely shape the upcoming ...
Ambient (aka pervasive, ubiquitous) media environments offer a plethora of context data as well as opportunities for context-related content production and consumption. They are the perfect environments for providing users with highly contextualized data-driven services and data-driven visual and additive content. To build such ambient media enviro...
With advancements in technology and culture, our language changes. We invent new words, add or change meanings of existing words and change names of existing things. Left untackled, these changes in language create a gap between the language known by users and the language stored in our digital archives. In particular, they affect our possibility t...
High impact events, political changes and new technologies are reflected in our language and lead to constant evolution of terms, expressions and names. Not knowing about names used in the past for referring to a named entity can severely decrease the performance of many computational linguistic algorithms. We propose NEER, an unsupervised method f...
The constantly growing amount of Web content and the success of the Social Web lead to increasing needs for Web archiving. These needs go beyond the pure preservation of Web pages. Web archives are turning into “community memories” that aim at building a better understanding of the public view on, e.g., celebrities, court decisions, and other event...
Knowing the behavior of terms in written texts can help us tailor fit models, algorithms and resources to improve access to digital libraries and help us answer information needs in longer spanning archives. In this paper we investigate the behavior of English written text in blogs in comparison to traditional texts from the New York Times, The Tim...
With the rapidly increasing pace at which Web content is evolving, particularly social media, preserving the Web and its evolution over time be-comes an important challenge. Meaningful analysis of Web content lends itself to an entity-centric view to organise Web resources according to the infor-mation objects related to them. Therefore, the crucia...
Setting the public as the driving force for formulating the future: Social Web is exploited as the main democratic channel to express the public's opinion on crucial social events. Leveraging the wisdom of the crowds, socially aware digital preservation is the key to the future of archiving. Future internet services will be offering social web anal...
Dynamic selection of Web services at runtime is important for building flexible and loosely-coupled service-oriented applications. An abstract description of the required services is provided at design-time, and matching service offers are located at runtime. With the growing number of Web services that provide the same functionality but differ in...
The term ambient media was in its beginning used only for ambient advertising. Nowadays it denotes the media environment and the communication of information in ubiquitous and pervasive environments. With the addition of intelligence, the new field of semantic ambient media was established. In recent years, the field of semantic ambient media has s...
Events are central aspect of many semantic ambient media applications such as surveillance, smart homes, automobiles, and others. Existing models for events typically do not follow a systematic development approach, are conceptually narrow with respect ...
The ARCOMEM project is about memory institutions like archives, museums and libraries in the age of the Social Web. Social media are becoming more and more pervasive in all areas of life. ARCOMEM's aim is to help to transform archives into collective memories that are more tightly integrated with their community of users and to exploit Web 2.0 and...
Proxy Credentials serve as a principal for authentication and authorization in the Grid. Despite their limited lifetime, they can be intercepted and abused by an attacker. We counter this threat by enabling Grid users to track their credentials' use in Grid infrastructures, reporting all authentication and delegation operations to an auditing servi...
We present, compare and contrast new directions in long term digital preservation as covered by the four large Eu-ropean Community funded research projects that started in 2011. The new projects widen the domain of digital preservation from the traditional purview of memory insti-tutions preserving documents to include scenarios such as health-care...
With the rapidly growing volume of resources on the Web, Web archiving becomes an important challenge. In addition, the notion of community memories extends traditional Web archives with related data from a variety of sources on the Social Web. Community memories take an entity-centric view to organise Web content according to the events and the en...
The SAME workshop takes place for the 3rd time in 2010, and it’s theme in this year was creating the business value-creation, vision, media theories and technology for ambient media. SAME differs from other workshops due to its interactive and creative touch and going beyond simple powerpoint presentations.
Several results will be published by AMEA...
Word sense discrimination is the first, important step towards automatic detection of language evolution within large, historic document collections. By comparing the found word senses over time, we can reveal and use important information that will improve understanding and accessibility of a digital archive. Algorithms for word sense discriminati...
Web service composition enables seamless and dynamic integration of business applications on the web. The performance of the composed application is determined by the performance of the involved web services. Therefore, non-functional, quality of service aspects are crucial for selecting the web services to take part in the composition. Identifying...
In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document's metadata, such as author name and title information. The proposed approach is based on a set of lexical resources constructed for our purposes (e.g., journal titles, conference names) and on a traditional machine-learning cla...
The medium is the message! And the message was literacy, media democracy and music charts. Mostly one single distinguishable medium such as TV, the Web, the radio, or books transmitted the message. Now in the age of ubiquitous and pervasive computing, ...
The medium is the message! And the message was literacy, media democracy and music charts. Mostly one single distinguishable
medium such as TV, the Web, the radio, or books transmitted the message. Now in the age of ubiquitous and pervasive computing,
where information flows through a plethora of distributed interlinked media—what is the message am...
An essential issue in peer-to-peer data management is to keep data highly available all the time. The paper presents a replication
protocol that adjusts autonomously the number of replicas to deliver a configured data availability guarantee. The protocol
is based on a Distributed Hash Table (DHT), measurement of peer online probability in the syste...
The run-time binding of web services has been recently put forward in order to support rapid and dynamic web service compositions. With the growing number of alternative web services that provide the same functionality but differ in quality parameters, the service composition becomes a decision problem on which component services should be selected...
The archival of content like publications or web pages is just the first step toward "full" content preservation. It also has to be guaranteed that content can be found and inter-preted in the long run. The correspondence between the terminology used for querying and the one used in content objects to be retrieved, is a crucial prerequisite for eff...