• Home
  • Giovanni Tummarello
Giovanni Tummarello

Giovanni Tummarello

About

87
Publications
9,344
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
2,156
Citations

Publications

Publications (87)
Conference Paper
In many applications, it is convenient to substitute a large data graph with a smaller homomorphic graph. This paper investigates approaches for summarising massive data graphs. In general, massive data graphs are processed using a shared-nothing infrastructure such as MapReduce. However, accurate graph summarisation algorithms are suboptimal for t...
Conference Paper
Full-text available
The task of entity retrieval becomes increasingly prevalent as more and more structured information about entities is available on the Web in various forms such as documents embedding metadata (RDF, RDFa, Microdata, Microformats). International benchmarking campaigns, e.g., the Text REtrieval Conference or the Semantic Search Challenge, propose ent...
Conference Paper
Full-text available
One of the reasons for the slow adoption of SPARQL is the complexity in query formulation due to data diversity. The principal barrier a user faces when trying to formulate a query is that he generally has no information about the underlying structure and vocabulary of the data. In this paper, we address this problem at the maximum scale we can thi...
Article
More and more (semi) structured information is becoming available on the Web in the form of documents embedding metadata (e.g., RDF, RDFa, Microformats and others). There are already hundreds of millions of such documents accessible and their number is growing rapidly. This calls for large scale systems providing effective means of search-ing and r...
Conference Paper
Full-text available
The Sindice Semantic Web index provides search capabilities over 260 million documents. Reasoning over web data enables to make explicit what would otherwise be implicit knowledge: it adds value to the information and enables Sindice to ultimately be more competitive in terms of precision and recall. However, due to the scale and heterogeneity of w...
Conference Paper
Full-text available
The task of entity retrieval becomes increasingly prevalent as more and more (semi-) structured information about objects is available on the Web in the form of documents embedding metadata (RDF, RDFa, Microformats, and others). However, research and development in that direction is dependent on (1) the availability of a representative corpus of en...
Conference Paper
Full-text available
In large web search engines the performance of Information Retrieval systems is a key issue. Block-based compression methods are often used to improve the search performance, but current self-indexing techniques are not adapted to such data structure and provide sub-optimal performance. In this paper, we present SkipBlock, a self-indexing model fo...
Conference Paper
We present Sig.ma, both a service and an end user application to access the Web of Data as an integrated information space. Sig.ma uses an holistic approach in which large scale semantic Web indexing, logic reasoning, data aggregation heuristics, ad-hoc ontology consolidation, external services and responsive user interaction all play together to c...
Conference Paper
Full-text available
Now motivated also by the partial support of major search engines, hundreds of millions of documents are being published on the web embedding semi-structured data in RDF, RDFa and Microformats. This scenario calls for novel information search systems which provide effective means of retrieving relevant semi-structured information. In this paper, we...
Conference Paper
Full-text available
On the Web of Data, entities are often interconnected in a way similar to web documents. Previous works have shown how PageR- ank can be adapted to achieve entity ranking. In this paper, we propose to exploit locality on the Web of Data by taking a layered approach, similar to hierarchical PageRank approaches. We provide justications for a two-laye...
Article
A large amount of semi structured data is now made available on the Web in form of RDF, RDFa and Microformats. In this chapter, we discuss a general model for the Web of Data and, based on our experience in Sindice.com , we discuss how this is reflected in the architecture and components of a large scale infrastructure. Aspects such as data collect...
Conference Paper
Full-text available
The use of RDF data published on the Web for applica- tions is still a cumbersome and resource-intensive task due to the limited software support and the lack of standard pro- gramming paradigms to deal with everyday problems such as combination of RDF data from dierent sources, object iden- tier consolidation, ontology alignment and mediation, or...
Conference Paper
Full-text available
In this paper we introduce Linked Data Driven Software De- velopment (LD2SD), a light-weight Semantic Web method- ology to turn software artefacts such as data from version control systems, bug tracking tools and source code into linked data. Once available as linked data, the related infor- mation from dierent sources is made explicit, allowing fo...
Conference Paper
Full-text available
Considering that thousands if not millions of linked datasets will be published soon, we motivate in this paper the need for an ecient and eective way to rank interlinked datasets based on formal descriptions of their characteristics. We propose DING (from Dataset RankING) as a new approach to rank linked datasets using information provided by the...
Article
Full-text available
DBin is a Semantic Web application that enables groups of users with a common interest to cooperatively create semantically structured knowledge bases. Such Semantic Web Communities are made possible by creating customized user environments called Brainlets. Brainlets provide user interfaces and domain specific tools (e.g., querying, viewing and ed...
Article
Cloud Computing refers to the use of large-scale computer clusters often built from low-cost hardware and network equipment, where resources are allocated dynamically among users of the cluster. While the paradigm is not entirely novel, recent developments in software frameworks for Cloud Computing are making it increasingly easy for programmers to...
Article
Full-text available
Purpose – The purpose of this paper is to investigate and prove the feasibility of a semantic web (SW) based approach to textual encoding. It aims to discuss benefits and novel possibilities with respect to traditional XML-based approaches. Design/methodology/approach – The markup process can be seen as a task of knowledge representation where elem...
Conference Paper
Full-text available
In this demo we present a first implementation of Semantic Web Pipes, a powerful tool to build RDF-based mashups. Semantic Web pipes are defined in XML and when executed they fetch RDF graphs on the Web, operate on them, and produce an RDF output which is itself accessible via a stable URL. Humans can also use pipes directly thanks to HTML wrapping...
Conference Paper
Full-text available
Increasing amounts of RDF data are available on the Web for consumption by Semantic Web browsers and indexing by Semantic Web search engines. Current Semantic Web publishing practices, however, do not directly support ecient discovery and high-performance retrieval by clients and search engines. We propose an extension to the Sitemaps protocol whic...
Article
Full-text available
The development of Social Websites that help users in creating and gathering knowledge, are discussed. These Social Websites help users in creating and gathering knowledge, by simplifying user contributions through blogs, tagging, folksonomies, wikis, podcasts, and online social networks. These Social Websites have also enabled community-based know...
Article
Full-text available
Data discovery on the Semantic Web requires crawling and indexing of statements, in addition to the 'linked-data' approach of de-referencing resource URIs. Existing Semantic Web search engines are focused on database-like functionality, compromising on index size, query performance and live updates. We present Sindice, a lookup index over Semantic...
Conference Paper
Full-text available
Sindice [1] is a backend service that operates on semantically structured data harvested from the Web. Sindice uses both crawlers and Semantic Web Sitemaps [2] to find RDF sources as well as microformats 1 such as XFN, hcards, hvote and others. Sindice targets developers by offering the following a set of API to find, reuse and publish structured
Article
Full-text available
Making effective use of RDF data published online (such as RDF DBLP, DBpedia, FOAF profiles) is, in practice, all but straightforward. Data might be fragmented or incomplete so that multiple sources need to be joined, different identifiers (URIs) are usually employed for the same entities, ontologies need alignment, certain information might need t...
Conference Paper
Full-text available
In this paper we present a methodology by which it is possi-ble to enhance existing web applications and directly deliver to the end users aggregated "views" of information. These views are accessed by clicking on buttons which are injected into the HTML of the existing application by lightweight plugins.
Article
Full-text available
This report presents Semantic Web pipes, a powerful paradigm to build RDF-based mashups. Semantic Web pipes work by fetching RDF models on the Web, operating on them, and producing an output which is itself accessible via a stable URL. We illustrate how Semantic Web pipes can solve use cases ranging from simple aggregation to complex collaborative...
Chapter
Metadata extracted from Multimedia or live sensoring is set to play a major role in any intelligent and multimodal interactions between humans and computers. Furthermore, it is generally required that such metadata is structured and encoded according to well agreed standards. This is fundamental to enable interoperability and create complex applica...
Conference Paper
Developers of Semantic Web applications face a challenge with respect to the decentralised publication model: where to find state- ments about encountered resources. The "linked data" approach, which mandates that resource URIs should be de-referenced and yield meta- data about the resource, helps but is only a partial solution and not followed wid...
Conference Paper
Full-text available
DBin is a Semantic Web application that enables groups of users with a common interest to cooperatively create semantically structured knowledge bases. These user groups, which we call "Semantic Web Communities", are made possible by creating customized user environments called "Brainlets". Brainlets provide user interfaces and domain specific tool...
Conference Paper
Full-text available
In this paper we take a view from the bottom to RDF(S) reasoning. We discuss some issues and requirements on reasoning towards effectively build- ing Semantic Web Pipes, aggregating RDF data from various distributed sources. If we leave out complex description logics reasoning and restrict ourselves to the RDF world, it turns out that some problems...
Conference Paper
Full-text available
In this paper we describe RDFSync, a methodology for efficient synchronization and merging of RDF models. RDFSync is based on decomposing a model into Minimum Self-Contained graphs (MSGs). After illustrating theory and deriving properties of MSGs, we show how a RDF model can be represented by a list of hashes of such information fragments. The sync...
Conference Paper
Full-text available
Developers of Semantic Web applications face a challenge with respect to the decentralised publication model: where to nd statements about encountered resources. The \linked data" approach, which man- dates that resource URIs should be de-referenced and yield metadata about the resource, helps but is only a partial solution. We present Sindice, a l...
Conference Paper
Full-text available
In this paper we take a view from the bottom to RDF(S) reasoning. We discuss some issues and requirements on reasoning towards effectively building Semantic Web Pipes, aggregating and patching RDF data from various distributed sources. Even if we leave out complex description logics reasoning and restrict ourselves to the RDF world, it turns out th...
Chapter
Full-text available
The aim of this chapter is to show how the need for advanced cooperative annotation and information exchange can be addressed using a paradigm called “Interconnected Geo-Semantic Web Communities”. The use cases and its associated needs are highlighted, and then the base tool for this work, the DBin Semantic Web information manager, is focused on. D...
Article
This paper describes an extension for the Sitemap protocol targeted at the efficient discovery and use of RDF data. Data publishers can state where RDF is located and alternative means to access it. Semantic Web clients and Se- mantic Web crawlers can use this information to access required RDF data in the most efficient way for the task they have...
Conference Paper
Full-text available
Over the years, a number of studies and practical efforts have proposed technologies to retrieve, publish, cooperatively edit, lookup and query RDF models. Also, other studies and practices have shown that standard communication channels, e.g. e-mail, RSS and IM, can be successfully used in Semantic Web scenarios. In this paper we collectively refe...
Conference Paper
Full-text available
In this paper we give an overview of the DBin Semantic Web information manager. Then we describe how it enables users to create and experience the Semantic Web by exchanging RDF knowledge in P2P "topic" channels. Once sufficient information has been collected lo- cally, rich and fast browsing of the Semantic Web becomes possible without generating...
Conference Paper
Full-text available
In this paper we present a novel kind of personal application: Rich Personal Semantic Web Clients. The idea is to enable to user to create and experience the Semantic Web as a collection of local databases which interoperate in semantic P2P "topic" channels by exchanging patches of RDF information. Once sufficient information has been collected loc...
Conference Paper
In this paper we introduce a novel kind scenario where users use Rich Personal Semantic Web Clients to cooperatively create knowledge within “Semantic Web Communities”. Such communities are formed around P2P channels which work by exchanging patches of RDF information among clients. Once sufficient information has been collected locally at each cli...
Conference Paper
Full-text available
Over the years, a number of studies and practical efforts have proposed technologies to retrieve, publish, cooperatively edit, lookup and query RDF models. Also, other studies and practices have shown that standard communication channels, e.g. e-mail, RSS and IM, can be successfully used in Semantic Web scenarios. In this paper we collectively refe...
Article
Full-text available
In this paper we present the "Brainlet" paradigm, a way to create rich Semantic Web user interfaces and interaction environments. Brainlets are half way between configuration files and light scripts and are "exe-cuted" by the DBin rich Semantic Web Platform. The main motivation behind Brainlets is enabling domain experts, rather than programmers, t...
Article
Full-text available
In this paper we perform a preliminary evaluation on how Semantic Web technologies such as RDF and OWL can be used to perform textual encoding. Among the potential advantages, we notice how RDF, given its conceptual graph structure, appears naturally suited to deal with overlapping hierarchies of annotations, something notoriously problematic using...
Article
Full-text available
In this paper we investigate the use of Semantic Web languages such as RDF and OWL in textual encoding and we discuss several advantages these tools could provide. Among these, we show how the overlapping markup encoding problem can be naturally solved as the graph structure of the RDF language is naturally suited for this purpose. Further advantag...
Conference Paper
Full-text available
Being able to determine the provenience of statements is a fundamental step in any SW trust modeling. We propose a methodology that allows signing of small groups of RDF statements. Groups of statements signed with this methodology can be safely inserted into any existing triple store without the loss of provenance information since only standard R...
Conference Paper
Full-text available
DBin is a novel kind of personal application which enables users to experience the Semantic Web by participating in P2P "discus­ sion groups" and exchanging metadata and annotations about common topics of interest. The p2p transport layer is provided by the RDF­ Growth algorithm which has characteristics of scalability and sustain­ ability even in...
Conference Paper
Full-text available
For machine and individuals to successfully exchange annota­ tions about audio resources, a technique is needed to provide these with stable identifiers. In this paper we propose ontological definitions of some identification problems and discuss the preliminary conceptual structure of the Music URI Infrastructure (MUI) project. The idea be­ hind M...
Article
Full-text available
DBin is a tool to build "Semantic Web P2P communities". It provides a general platform on top of which it is possible to create and run P2P Semantic Web applications where users contribute by annotating topics of common interest. DBin introduces a Semantic Computing scenario where users "slowly", but in a sustainable way, build a local DB and can t...
Article
Full-text available
In this article we present an high level overview of the HyperJournal project, an effort to provide novel possibilities both in Scientific Publishing and in access to Scientific Contributions, according to the Open Access movement guidelines. All the work has been implemented using the PHP Web Script language and interfacing with Java modules such...
Conference Paper
This paper describes a research activity carried on in the frame of a collaboration among ISTI-CNR, leader of the 'Representation and Communication of Data and Metadata' Workpackage in the European Project MUSCLE-NoE ('Multimedia Understanding through Semantics, Computation and Learning Network of Excellence') and the Dept. of Electronics, A.I. & T...
Conference Paper
Full-text available
In this paper we illustrate the preliminary exploration of a semantic Web scenario where acoustic relationship among audio resources are extracted in a distributed way and made available using the tools of the semantic Web. In this scenario, peers use MPEG-7 to exchange description of audio they are interested in and use these to locally extract di...
Conference Paper
Full-text available
We present an architecture that provides semantic Web annotations of sound clips described by MPEG-7 audio descriptions. The great flexibility of the MPEG-7 standard makes especially difficult to compare descriptions coming from heterogeneous sources. To cope with this, the architecture would first obtain "normalized" versions of the audio descript...
Conference Paper
Full-text available
We present RDFGrowth, an algorithm that addresses a specific yet important scenario: large scale, end user targeted, metadata exchange P2P appli- cations. In this scenario, peers perform browsing and querying of semantic web statements on a local database without directly generating network traffic or re- mote query execution. The database grows by...
Conference Paper
Full-text available
Web based musical discussions are sometimes limited by the fact that most of the applications used (forums, guestbooks etc.) only support textual messages. Although systems with advanced functionalities like file attachments or images and sounds inline inserts are technically possible, very few if any of the existing "off the shelf", freely availab...
Conference Paper
In this paper we present the results of applying adaptive nonlinearities and maximum entropy principle to identify an inverting filter for the post nonlinear blind source deconvolution problem. The filter is a cascade of a linear FIR matrix and a nonlinear memoryless componentwise system. Cubic splines and polynomials have been selected as adaptive...
Conference Paper
Correctly estimating the best filter adaptation rate in acoustic echo cancellation (AEC) is known to be a complex problem justifying advanced algorithms. In this paper we illustrate a new neural network estimator based on a set of classic statistical estimators and a proper, mostly generalized, offline training. The output step size estimated is sh...
Conference Paper
Full-text available
The unwanted soil collected during the harvesting of the sugar beets represents one of the main by-products of the sugar industry in Europe. In this study, we adopt different neural network predictors to be used as a base for an online harvesting optimization scheme. Our aim is to demonstrate the feasibility of a combinatory optimization of the har...