Harald KoschUniversity of Passau · Chair of Distributed Information Systems
Harald Kosch
Dr. (ENS Lyon)
About
284
Publications
53,887
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,079
Citations
Publications
Publications (284)
High voter turnout in elections and referendums is desirable to ensure a robust democracy. Secure electronic voting is a vision for the future of elections and referendums. Such a system can counteract factors hindering strong voter turnout such as the requirement of physical presence during limited hours at polling stations. However, this vision b...
The automotive industry is experiencing a transformation with the rapid integration of software-based systems inside vehicles, which are complex systems with multiple sensors. The use of vehicle sensor data has enabled vehicles to communicate with other entities in the connected vehicle ecosystem, such as the cloud, road infrastructure, other vehic...
With the advent of sensors, more and more services are developed in order to provide customers with insights about their health and their appliances’ energy consumption at home. To do so, these services use new mining algorithms that create new inference channels. However, the collected sensor data can be diverted to infer personal data that custom...
To date, the number of studies that address the generalization of argument models is still relatively small. In this study, we extend our stacking model from argument identification to an argument unit classification task. Using this model, and for each of the learned tasks, we address three real-world scenarios concerning the model robustness over...
In the current world, individuals are faced with decision making problems and opinion formation processes on a daily basis. Nevertheless, answering a comparative question by retrieving documents based only on traditional measures (such as TF-IDF and BM25) does not always satisfy the need. In this paper, we propose a multi-layer architecture to answ...
Nowadays applications produce and manage data of individual among which some may be sensitive and must be protected. Moreover, with the advent of smart applications, sensor data are produced by IoT devices in a huge quantity and sent to servers in the vicinity to be stored and processed. Meanwhile, newly discovered inference channels involving sens...
Stock market prediction is a difficult problem that has always attracted researchers from different domains. Recently, different studies using text mining and machine learning methods were proposed. However, the efficiency of these methods is still highly dependant on the retrieval of relevant information. In this paper, we investigate novel data s...
Argument identification is the cornerstone of a complete argument mining pipeline. Furthermore, it is the essential key for a wide spectrum of applications such as decision making, assisted writing, and legal counselling. Nevertheless, most existing argument mining approaches are limited to a single, specific domain. The problem of building a robus...
High voter turnout in elections and referendums is very desirable in order to ensure a robust democracy. Secure electronic voting is a vision for the future of elections and referendums. Such a system can counteract factors that hinder strong voter turnout such as the requirement of physical presence during limited hours at polling stations. Howeve...
High voter turnout in elections and referendums is very desirable in order to ensure a robust democracy. Secure electronic voting is a vision for the future of elections and referendums. Such a system can counteract factors that hinder strong voter turnout such as the requirement of physical presence during limited hours at polling stations. Howeve...
Crowdsourcing is a time- and cost-efficient web-based technique for labeling large datasets like those used in Machine Learning. Controlling the output quality in crowdsourcing is an active research domain which has yielded a fair number of methods and approaches. Due to the quantitative and qualitative limitations of the existing evaluation datase...
We present a method to incrementally generate complete 2D or 3D scenes with the following properties: (a) it is globally consistent at each step according to a learned scene prior, (b) real observations of a scene can be incorporated while observing global consistency, (c) unobserved regions can be hallucinated locally in consistence with previous...
The W3C Web of Things (WoT) is introduced as a larger context of the Internet of Things (IoT). It provides standards for communication and interaction with Things in the IoT in order to address IoT cross-domain and cross-platform interoperability problems and reduce its fragmentation. WoT uses a formal interface description called Thing Description...
This paper presents a method for the integration of data originating from sensors and actuators that follow different formalisms, although they semantically interlap. We tested our approach one three Web of Things standards published respectively by the Open Mobile Alliance (OMA), the Open Connectivity Foundation (OCF) and the oneM2M foundation.
Ou...
While convolutional neural networks are dominating the field of computer vision, one usually does not have access to the large amount of domain-relevant data needed for their training. It thus became common to use available synthetic samples along domain adaptation schemes to prepare algorithms for the target domain. Tackling this problem from a di...
The upcoming General Data Protection Regulation (GDPR) imposes several new legal requirements for privacy management in information systems. In this paper, we introduce LPL, an extensible Layered Privacy Language that allows to express and enforce these new privacy properties such as personal privacy, user consent, data provenance, and retention ma...
The upcoming General Data Protection Regulation (GDPR) imposes several new legal requirements for privacy management in information systems. In this paper, we introduce LPL, an extensible Layered Privacy Language that allows to express and enforce these new privacy properties such as personal privacy, user consent, data provenance, and retention ma...
With the increasing availability of large databases of 3D CAD models, depth-based recognition methods can be trained on an uncountable number of synthetically rendered images. However, discrepancies with the real data acquired from various depth sensors still noticeably impede progress. Previous works adopted unsupervised approaches to generate mor...
The recent JSON-LD standard, that specifies an object notation for RDF, has been adopted by a number of data providers on the Web. In this paper, we present a novel usage of JSON-LD, as a compact format to exchange and query RDF data in constrained environments, in the context of the Web of Things. A typical exchange between Web of Things agents in...
This paper presents the \(\mu \)RDF store, a triple store designed for micro-controllers with limited memory, typically 8 to 64 kB. The \(\mu \)RDF store exposes a query interface inspired by SPARQL that supports basic graph pattern queries. Data is sent over CoAP and serialized in EXI, a binary format for XML. The performances of its processing en...
Recent progress in computer vision has been dominated by deep neural networks trained with large amount of labeled data. Collecting and annotating such datasets is however a tedious, and in some contexts impossible task; hence a recent surge in approaches that rely solely on synthetically generated data from 3D models for their training. For depth...
Web Search engines have become an indispensable online service to retrieve content on the Internet. However, using search engines raises serious privacy issues as the latter gather large amounts of data about individuals through their search queries. Two main techniques have been proposed to privately query search engines. A first category of appro...
Currently, Reverse Geo-tagging relies on the keywords describing an image and use probabilistic algorithmsto guess the localization of the depicted scene. However, such algorithms still perform poorly and show clear limitations.Notably, the location estimation only occurs at the landmark level; regions or countries are only processed throughtheir c...
W3C Web annotations are a powerful way to support meta-data information about digital resources. The Web Annotation Data Model proposes standardised RDF structures that express this by implementing a hierarchical annotation structure. Those annotations are designed to be shared, linked, tracked back as well as searched and discovered across differe...
The quality of content-based recommendation depends to a very high degree on the quality of the metadata available. We propose a workflow that combines novel cross media analysis platforms with linked data analysis to generate recommendations. The focus is set on an “editor user story” that combines live analysis of currently created content with a...
The Web Annotation Data Model proposes standardised RDF structures to form “Web Annotations”. These annotations are used to express metadata information about digital resources and are designed to be shared, linked, tracked back, as well as searched and discovered across different peers. Although this is an expressive and rich way to create metadat...
The Media Fragment URI specification was released in 2012 and has been taken up by research and industry to some extend. Nevertheless the impact is weak in comparison to other W3C recommendations. Missing features, under-specified parts and a weak integration into common standards could be a key issues for that. In this paper we de- scribe possible...
Whereas the former Web mostly consisted of information represented in textual documents, nowadays the Web includes a huge number of multimedia documents like videos, photos, and audio. This enormous increase in volume in the private, and above all in the industry sector, makes it more and more difficult to find relevant information. Besides the pur...
The advances of the Linked Open Data (LOD) initiative are giving rise to a
more structured Web of data. Indeed, a few datasets act as hubs (e.g., DBpedia)
connecting many other datasets. They also made possible new Web services for
entity detection inside plain text (e.g., DBpedia Spotlight), thus allowing for
new applications that can benefit from...
The growing number of elderly people combined with financial cuts in the health care sector lead to an increased demand for computer supported medical services. New standards like HTML5 allow the creation of hypervideo training applications that run on a variety of end user devices. In this paper, we evaluate an HTML5 player running an e-health hyp...
The amount of audio, video and image data on the web is immensely growing, which leads to data management problems based on the hidden character of multimedia. Therefore the interlinking of semantic concepts and media data with the aim to bridge the gap between the document web and the Web of Data has become a common practice and is known as Linked...
Les avancées de l'initiative Linked Open Data (LOD) ont permis de mieux structurer le web des données. En effet, quelques jeux de données servent de centralisateurs (par exemple, DBpedia) et permettent ainsi de maintenir les différentes sources de données du LOD liées entre elles. Ces jeux de données ont également permis le développement de service...
Querying Web search engines is by far the most frequent activity performed by online users and consequently the one in which they are likely to reveal a significant amount of personal information. Protecting the privacy of Web requesters is thus becoming increasingly important. This is often done by using systems that guarantee unlinkability betwee...
The Semantic Web grows constantly and promises a huge amount of machine-interpretable information. Unfortunately, the integration and usage of semantic information is not feasible by everyone. Hence, a large number of Semantic Web applications are lacking and the potential of the semantic knowledge remains unexploited. We propose balloon Synopsis,...
In this paper we outline instructional, legal, and software requirements as well as a prototypical software implementation for a multimedia help system in a rehabilitation scenario. The help system will be used by patients in a rehabilitation clinic to support their pelvic floor exercises. After describing the use case we will outline the requireme...
Folksonomies - networks of users, resources, and tags allow users to easily retrieve, organize and browse web contents. However, their advantages are still limited mainly due to the noisiness of user provided tags. To overcome this issue, we propose an approach for characterizing related tags in folksonomies: we use tag co-occurrence statistics and...
We enhance an existing search engine's snippet (i.e. excerpt from a web page determined at query-time in order to efficiently express how the web page may be relevant to the query) with linked data (LD) in order to highlight non trivial relationships between the information need of the user and LD resources related to the result page. Given a query...
A great research challenge in multimedia systems is the user-centric design and adaptation of interfaces to multimedia systems. User-centric does involve methods of HCI (Human-Centric Interface) research, but should also better incorporate the interest of the users, the usage context and the technical properties of the system.
This work focuses on...
A great research challenge in multimedia systems is the user-centric design and adaptation of interfaces to multimedia systems. User-centric does involve methods from HCI (Human-Centric Interface) research, but should also better incorporate the interest of the users, the usage context and the technical properties of the system.
This work focuses...
With recent technologies, it is possible to create appealing multimedia presentations or extended videos with a high level of interactivity. Standards like SMIL provide extensive structures to describe metadata for timing and spacing of single media elements which then form a presentation. While multimedia presentations are viewed mainly in a linea...
In this demo paper we present how the SIVA Suite can be used as a multimedia help system for technical applications in SMEs. After describing our use case, a mechanics scenario, we show how our software was extended to fit all requirements of this scenario. We present short overviews over each component of the SIVA Suite: the authoring tool, the pl...
Nowadays, the RDF data model is a crucial part of the Semantic Web. Especially web developers favour RDF serialization formats like RDFa and JSON-LD. However, the visualization of large portions of RDF data in an appealing way is still a cumbersome task. RDF visualizers in general are not targeting the Web as usage scenario or simply display the co...
Interconnecting machine readable data with multimedia assets and fragments has recently become a common practice. But specific retrieval techniques for the so called Semantic Multimedia data are still lacking. On our poster we present SPARQL-MM, a function set that extends SPARQL to Media Fragment facilities by introducing spatio-temporal filter an...
We enhance an existing search engine’s snippet (i.e. excerpt from a web page determined at query-time in order to efficiently express how the web page may be relevant to the query) with linked data (LD) in order to highlight non trivial relationships between the information need of the user and LD resources related to the result page. To do this, w...
While Linked Open Data showed enormous increase in volume, yet there is no single point of access for querying the over 200 SPARQL repositories. In this paper we present Balloon Fusion, a SPARQL 1.1 rewriting and query federation service build on crawling and consolidating co-reference relationships in over 100 reachable Linked Data SPARQL Endpoint...
In this paper, a dataset of geotagged photos on a world-wide scale is presented. The dataset contains a sample of more than 14 million geotagged photos crawled from Flickr with the corresponding metadata. To guarantee the spatial representativeness of the dataset, a crawling approach based on the small-world phenomena and the Flickr friendship's gr...
Folksonomies - networks of users, resources, and tags allow users to easily retrieve, organize and browse web contents. However, their advantages are still limited according to the noisiness of user provided tags. To overcome this problem, we propose an approach for identifying related tags in folksonomies. The approach uses tag co-occurrence stati...
In this paper, we introduce Folkioneer, a novel approach for browsing and exploring community-contributed geotagged images. Initially, images are clustered based on the embedded geographical information by applying an enhanced version of the CURE algorithm, and characteristic geodesic shapes are derived using Delaunay triangulation. Next, images of...
The creation process of interactive non-linear videos affords the definition of scenes which are connected in a scene graph. These might be available in form of raw material (shots) or need to be extracted from existing films. In the latter case, the scenes have to be defined by the author in a time-consuming process. A semi-automated scene extract...
Research depends to a large degree on the availability and quality of primary research data, i.e., data generated through experiments and evaluations. While the Web in general and Linked Data in particular provide a platform and the necessary technologies for sharing, managing and utilizing research data, an ecosystem supporting those tasks is stil...
Personalization is the process of adapting the output of a system to a user's context and profile. User information such as geographical location, academic and professional background, membership in groups, interests, preferences, opinions, etc. may be used in the process. Big data analysis techniques enable collecting accurate and rich information...
With HTML5 becoming more and more standardized, it is possible to implement platform independent applications running on smartphones, tablets, laptops, and desktop PCs with different operating systems and screen resolutions. A combination of HTML5 with CSS and JavaScript allows the implementation of appealing user interfaces with advanced functiona...
Digital ecosystems rely on reputation systems in order to build trust and to foster collaborations among users. Reputation systems are commonplace in the Customer-to-Customer and Business-to-Customer contexts, however, they have not yet found mainstream acceptance in Business-to-Business (B2B) environments. Our first contribution in this paper is t...
Opportunistic networks are generally characterized by a low performance due to user mobility and non end-to-end communications, which makes it challenging to use them in scenarios involving the transmission of large data, such as video transmission. This article presents simulated experiments of transmitting different video sequences in opportunist...
In communication architectures, nodes are expected to spend their own resources so as to relay other nodes' messages or perform other services for the common good. However any selfish node, if given the opportunity, would typically prefer - to spare its own resources - to avoid serving the other nodes. This creates a potential problem to any collab...
The goal of the W3C’s Media Annotation Working Group is to promote interoperability between multimedia meta-data formats on the Web. As experienced by everybody, audio-visual data is omnipresent on today’s Web. However, different interaction interfaces and especially diverse metadata formats prevent unified access and navigation. Related to this, t...