Martin Rajman

Martin Rajman
École Polytechnique Fédérale de Lausanne | EPFL · Vice Presidency for Academic Affairs

PhD

About

179
Publications
22,482
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,901
Citations
Citations since 2016
9 Research Items
271 Citations
201620172018201920202021202201020304050
201620172018201920202021202201020304050
201620172018201920202021202201020304050
201620172018201920202021202201020304050
Introduction
I am the EPFL Ambassador to digitalswitzerland (https://digitalswitzerland.com/), a Swiss-wide, multi-stakeholder initiative coordinating the contributions of its over 150 members to strengthen Switzerland’s position as a leading digital player. In parallel with my ambassador activity, I am senior researcher at EPF Lausanne, Switzerland (EPFL). My research interests include Artificial Intelligence, Computational Linguistics and Data-driven Probabilistic Machine Learning.

Publications

Publications (179)
Chapter
In this chapter, we specifically address the impact of the Nano-Tera program as a whole. After its almost 10-year run, this provides an overall analysis of main achievements of the program. We describe the impact of the program within the five following dimensions:
Chapter
Energy is a central concern that affects system design, society and the economy. The need for new sources of power, renewable and sustainable energy supplies, and smart cities is creating immense challenges along with significant opportunities. Nano-Tera has addressed various high relevance application areas such as low power reliable electronics,...
Chapter
Within Nano-Tera first phase the objectives of the research on environmental monitoring included checking the quality of air and water, by measuring pollution in terms of biological and/or inorganic compounds; and instrumenting the environment to detect movements that can lead to catastrophes, such as rockslides, avalanches, floods or to the instab...
Chapter
In this chapter, we present the full list, including all partners and key references, of three main types of projects:
Chapter
Health management and monitoring has been one of the focal areas in the Nano-Tera program. Nano-Tera funded healthcare projects have targeted many issues that can be thematically categorized in four distinct clusters: smart prosthetics, advanced diagnosis tools, medical care support and biosensing. We will present a digest of these projects in this...
Book
This book presents the overall vision and research outcomes of Nano-Tera.ch, which is a landmark Swiss federal program to advance engineering system and device technologies with applications to Health and the Environment, including smart Energy generation and consumption. The authors discuss this unprecedented nation-wide program, with a lifetime o...
Article
Bettering human health care and creating a sustainable environment through monitoring and smart energy usage are primary goals of advanced societies. To this purpose, the Swiss Government, through the State Secretariat for Education, Research, and Innovation has been funding the Nano-Tera.ch program for about nine years with an overall budget of ap...
Patent
The invention provides a method for estimating probabilities of a user clicking on items appearing in a results list obtained by a web search engine to predict the revenue for the results list, the web searching engine being used to search for the items on the web, the results list comprising a plurality of items. Each of the items has one or more...
Article
Search engines essentially rely on the structure of the graph of hyperlinks. Although accurate for the main trend, this is not effective when some query is ambiguous. Leveraging semantic information by the mean of interest matching allows proposing complementary results that are tailored to the user's expectations. This paper proposes a collaborati...
Conference Paper
Full-text available
We present the result of an experimental system aimed at performing a robust semantic analysis of analyzedspeech input in the are of information system access. The goal of this experiment was to investigate theeectiveness of such a system in a pipelined architecture, where no control is possible over the morpho-syntacticanalysis which precedes the...
Data
Full-text available
This paper gives an overview of the assessment and evaluation methods which have been used to determine the quality of the INSPIRE smart home system. The system allows different home appliances to be controlled via speech, and consists of speech and speaker recognition, speech understanding, dialogue management, and speech output components. The pe...
Article
The aggregation of distributions, composed of the number of occurrences of each element in a set, is an operation that lies at the heart of several large-scale distributed applications. Examples include popularity tracking, recommendation systems, trust management, or popularity measurement mechanisms. These applications typically span multiple adm...
Conference Paper
Full-text available
In this paper we present an approach to score aggregation for specialized search systems. In our work we focus on document ranking in scientific publication databases. We work with the collection of scientific publications of the CERN Document Server. This paper reports on work in progress and describes rank aggregation framework with score normali...
Article
Full-text available
In this paper we present an approach to score aggregation for specialized search systems. In our work we focus on document ranking in scientific publication databases. We work with the collection of scientific publications of the CERN Document Server. This paper reports on work in progress and describes rank aggregation framework with score normali...
Conference Paper
Full-text available
Invenio is the web-based integrated digital library system developed at CERN. Within this framework, we present four types of ranking models based on the citation graph that complement the simple approach based on citation counts: time-dependent citation counts, a relevancy ranking which extends the PageRank model, a time-dependent ranking which co...
Conference Paper
Many distributed applications, such as collaborative Web mapping, collaborative feedback and ranking, or bug reporting systems, rely on the aggregation of privacy-sensitive information gathered from human users. This information is typically aggregated at servers and later used as the basis for some collaborative service. Expecting that clients tru...
Article
Full-text available
Web search engines designed on top of peer-to-peer (P2P) overlay networks show promise to enable attractive search scenarios operating at a large scale. However the design of effective indexing techniques for extremely large document collections still raises a number of open technical challenges. Resource sharing, self-organization, and low mainten...
Conference Paper
Popular search engines essentially rely on information about the structure of the graph of linked elements to find the most relevant results for a given query. While this approach is satisfactory for popular interest domains or when the user expectations follow the main trend, it is very sensitive to the case of ambiguous queries, where queries can...
Article
The size of digital libraries is increasing, making navigation and access to information more challenging. Improving the system by observing the users’ activities can help at providing better services to users of very large digital libraries. In this paper we explain how the Invenio open-source software, used by the CERN Document Server (CDS) allow...
Article
Full-text available
For most users, Web-based centralized search engines are the access point to distributed resources such as Web pages, items shared in file sharing-systems, etc. Unfortunately, ex-isting search engines compute their results on the basis of structural information only, e.g., the Web graph structure or query-document similarity estimations. Users expe...
Article
The present contribution focuses on the integration of word senses in a vector representation of texts, using a probabilistic model. The vector representation under consideration is the DSIR model, that extends the standard Vector Space (VS) model by taking both occurrences and co-occurrences of words into account. Integration of word senses into t...
Article
Finding the most probable parse tree in the framework of Data-Oriented Parsing (DOP), a Stochastic Tree Substitution Parsing scheme developed by R. Bod (Bod 1992), has proven to be NP-hard in the most general case (Sima'an 1996). However, introducing some a priori restrictions on the choice of the elementary trees (i.e. grammar rules) leads to...
Technical Report
Full-text available
GRACE is the first large-scale evaluation program of taggers for French. This experiment allowed to compare the assignments of Parts-of-Speech tags by various different taggers, on a com mon corpus of literary and journalistic texts. The evaluati on relied on the acceptance by the participants of a reference formalism for morpho-syntactic descripti...
Article
Full-text available
CDS Invenio is the web-based integrated digital library system developed at CERN. It is a suite of applications which provides the framework and tools for building and managing an autonomous digital library server. Within this framework, the goal of this project is to implement new ranking methods based on the bibliographic citation graph extracted...
Article
Full-text available
The goal of the d-Rank project is to study rank aggregation in scientific publication databases. In our work we focus in particular on document ranking in the domain of particle physics and we work with the collection of CERN publications called the CERN Document Server. In this report we present the main advances achieved within the second phase o...
Conference Paper
Full-text available
Numerous retrieval models have been defined within the field of information retrieval (IR) to produce a ranked and ordered list of documents relevant to a given query. Existing models are in general well-explored and thoroughly evaluated using traditionally centralized IR engines. However, the problem of producing global relevance scores to enable...
Article
Full-text available
In this paper we present the AlvisP2P IR engine, which enables efficient retrieval with multi-keyword queries from a global document collection available in a P2P network. In such a network, each peer publishes its local index and invests a part of its local computing resources (storage, CPU, bandwidth) to maintain a fraction of a global P2P index....
Conference Paper
Document ranking for scientific publications involves a variety of specialized resources (e.g. author or citation indexes) that are usually difficult to use within standard general purpose search engines that usually operate on large-scale heterogeneous document collections for which the required specialized resources are not always available for a...
Conference Paper
Full-text available
Despite the many research efforts invested recently in peer-to-peer search engines, none of the proposed system has reached the level of quality and efficiency of their centralized counterpart. One of the main reasons for this inferior performance is the difficulty to attract a critical mass of users that would make the peer-to-peer system truly co...
Article
Full-text available
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations, and...
Conference Paper
Full-text available
The suitability of peer-to-peer (P2P) approaches for full-text Web retrieval has recently been questioned because of the claimed unacceptable bandwidth consumption induced by retrieval from very large document collections. In this contribution we formalize a novel indexing/retrieval model that achieves high performance, cost-efficient retrieval by...
Conference Paper
Full-text available
In this paper we discuss the problems faced when trying to design an evaluation protocol for a multimodal system using novel input modalities and in a new domain. In particular, we focus on the problem of trying to minimize bias towards certain modalities and interaction patterns. Such bias might be introduced by experimenters in the instructions g...
Conference Paper
Full-text available
We describe a query-driven indexing framework for scalable text retrieval over structured P2P networks. To cope with the bandwidth consumption problem that has been identified as the major obstacle for full-text retrieval in P2P networks, we truncate posting lists associated with indexing features to a constant size storing only top-k ranked docume...
Article
In this paper, we present a query-driven indexing/retrieval strategy for efficient full text retrieval from large document collections distributed within a structured P2P network. Our indexing strategy is based on two important properties: (1) the generated distributed index stores posting lists for carefully chosen indexing term combinations that...
Article
Full-text available
We present Alvis peers, a full-text P2P retrieval engine designed to offer retrieval performance comparable to centralized solutions while scaling to a very large number of peers. It is the result of our research efforts within the project Alvis1 European FP 6 STREP project ALVIS, http://www.alvis.info/ that aims at building a truly-distributed sem...
Conference Paper
Full-text available
Peer-to-peer networks have been identified as promising arc hitectural concept for developing search scenarios across digital lib rary collections. Digi- tal libraries typically offer sophisticated search over th eir local content, however, search methods involving a network of such stand-alone components are cur- rently quite limited. We present a...
Conference Paper
Full-text available
TFIDF was widely used in IR system based on the vector space model (VSM). Pagerank was used in systems based on hyperlink structure such as Google. It was necessary to develop a technique combining the advantages of two systems. In this paper, we drew up a framework by using the content of web pages and the out-link information synchronously. We se...
Chapter
In the framework of the JuriSent case study, carried out within the European NEMIS thematic network, we analyze the contribution of text mining techniques to improve the consultation of jurisprudence textual databases. We mainly focus on correspondence analysis (CA) techniques, but also provide some insights on similar visualization techniques, suc...
Chapter
In this contribution, we present the StatSearch prototype, a search engine that enables an enhanced access to domain specific data available on the Web. The StatSearch engine proposes a hybrid search interface combining query-based search with automated navigation through a tree-like hierarchical structure. The goal of such an interface is to allow...
Article
Full-text available
Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval raises a number of technical challenges due to po...
Conference Paper
Full-text available
This paper presents Archivus, a multi- modal language-enabled meeting brows- ing and retrieval system. The prototype is in an early stage of development, and we are currently exploring the role of nat- ural language for interacting in this rela- tively unfamiliar and complex domain. We briefly describe the design and implemen- tation status of the...
Article
Full-text available
In this paper we report an experiment of an automated metric used to analyze the grammaticality of machine translation output. The approach (Rajman, Hartley, 2001) is based on the distribution of the linguistic information within a translated text, which is supposed similar between a learning corpus and the translation. This method is quite inexpen...
Article
The suitability of Peer-to-Peer (P2P) approaches for full-text web retrieval has recently been questioned because of the claimed unacceptable bandwidth consumption induced by retrieval from very large document collections. In this contribution we present a novel indexing/retrieval model that achieves high performance, cost-efficient retrieval by in...
Article
Full-text available
We consider example-critiquing systems that help people search for their most preferred item in a large electronic catalog. We analyze how such systems can help users in the framework of four existing example-critiquing approaches (RABBIT, FindMe, Incremental Critiquing, ATA and AptDecision). In a second part we consider the use of several types of...
Article
Full-text available
We consider example-critiquing systems that help people search for their most preferred item in a large catalog. We compare 6 existing approaches in terms of user or system- centric, implicit or explicit use of preferences, assumptions used and their behavior in underconstrained and overconstrained situations. We consider several types of explicit...
Article
Full-text available
In this paper we present a proposal for extending the standard Wizard of Oz experimental methodology to language-enabled multimodal systems. We first discuss how Wizard of Oz experiments involving multimodal systems differ from those involving voice-only systems. We then go on to discuss the Extended Wizard of Oz methodology and the Wizard of Oz te...
Conference Paper
Full-text available
Internet search has a strong business model that permits a free service to users, so it is difficult to see why, if at all, there should be open source offerings as well. This paper first discusses open source search, and a rationale for the computer science community at large to get involved. Because there is no shortage of core open source compon...
Conference Paper
Full-text available
The aim of the work described in this paper is to extend the EPFL dialogue platform with multimodal capabilities. Based on our experience with the EPFL Rapid Dialogue Prototyping Methodology (RDPM), we formulate precise design principles that provide the neces-sary frame to use the RDPM to rapidly create an e cient multimodal interface for a given...
Conference Paper
Full-text available
This paper exposes the Rapid Dialogue Prototyping Methodology [4, 3, 1], a methodology allowing the easy and automatic derivation of an ad hoc dialogue management system from a specific task description. The goal of the produced manager is to provide the user with a dialogue based interface to easily perform the target task. In addition, reset patt...
Article
Full-text available
The aim of this report is to describe the browsers that have been developed by various groups within the IM2 1 project, highlighting goals, design methodologies, key functionalities and evaluation methods used by each. The paper concludes with a tabular overview of the media, input and output modalities and special functionalities handled by each b...
Article
Full-text available
Several Networks of Excellence have been set up in the framework of the European FP5 research program. Among these Networks of Excellence, the NEMIS project focuses on the field of Text Mining. Within this field, document processing and visualization was identified as one of the key topics and the WG1 working group was created in the NEMIS project,...
Article
Full-text available
Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hyper...
Article
Full-text available
Automatic indexing is one of the important technologies used for Tex-tual Data Analysis applications. Standard document indexing techniques usually identify the most relevant keywords in the documents. This paper presents an alternative approach that aims at performing document indexing by associating concepts with the document to index instead of...
Article
Web search engines designed on top of peer-to-peer (P2P) overlay networks show promise to enable attractive search scenarios operating at a large scale. However the design of effective indexing techniques for extremely large document collections still raises a number of open technical challenges. Resource sharing, self-organization, and low mainten...
Article
Full-text available
This report presents a robust syntactic parser that is able to return a "correct" derivation tree even if the grammar cannot generate the input sentence. The following two step solution is prop osed: the finest corresponding most probable optimal maximum coverage is generated first, then the trees from this coverage are glued into one resulting tre...
Article
Full-text available
This report describes a model-driven approach to natural language understanding (NLU) in which the “meaning” of natural language queries is extracted based on a domain model composed of a set of concepts and relations specified in the system’s domain. The extracted meaning is represented as a set of semantic constraints that describe concept instan...
Article
Full-text available
In this paper we present an extension of the EPFL's Rapid Dialogue Prototyping Methodology to include multimodality, and show how it can be applied in the design of a multimodal application, the Archivus system. We begin with an overview of the standard speech-only rapid dialogue prototyping methodology, followed by a discussion of the extensions i...
Article
Full-text available
People are increasingly using provider services through the Internet. While a web site provides information about the contract terms and conditions that the clients have to assent to in order to use its services, in web services there is no such way for taking legal issues into account. There are some attempts to build machine readable eContract la...
Article
Full-text available
In this paper we present the results of the StatSearch case study that aimed at providing enhanced access to statistical data available on the Web. In the scope of this case study we developed a prototype of an information access tool combining uerybased search engine with semi-automated navigation techniques exploiting hierarchical structuring of...
Article
Full-text available
Lexical resources such as WordNet and the EDR electronic dictionary have been used in several NLP tasks. Probably, partly due to the fact that the EDR is not freely available, WordNet has been used far more often than the EDR. We have used both resources on the same task in order to make a comparison possible. The task is automatic assignment of ke...
Article
Full-text available
The integration of a dialogue management system with a rapid dialogue prototyping methodology (RDPM) in the framework of the InfoVox project was analyzed. The RDPM methodology was decomposed into five consecutive steps namely, producing a task model, automatically deriving an initial dialogue model from the produced task model, using the generated...
Article
Full-text available
This paper gives an overview of the assessment and evaluation methods which have been used to determine the quality of the INSPIRE smart home system. The system allows different home appliances to be controlled via speech, and consists of speech and speaker recognition, speech understanding, dialogue management, and speech output components. The pe...
Conference Paper
Full-text available
This paper is about the automated production of dialogue models. The goal is to propose and validate a methodology that allows the production of finalized dialogue models (i.e. dialogue models specific for given applications) in a few hours. The solution we propose for such a methodology, called the Rapid Dialogue Prototyping Methodology (RDPM), is...
Conference Paper
Standard stochastic grammars use generative probabilistic models, focusing on rewriting probabilities conditioned by the symbol to be rewritten. Among several other undesired behaviors, such grammars tend to give penalty to longer derivations of the same input, which is a drawback when they are used for analysis (rather than generation). In this co...
Conference Paper
Full-text available
Unreliable speech recognition, especially in noisy environments and the need for more natural interaction between man and machine have motivated the development of multimodal systems using speech, pointing, gaze, and facial expressions. In this paper we present a new approach to fuse multimodal information streams using agents. A general framework...
Article
Full-text available
CESTA, the first European Campaign dedicated to MT Evaluation, is a project labelled by the French Technolangue action. CESTA provides an evaluation of six commercial and academic MT systems using a protocol set by an international panel of experts. CESTA aims at producing reusable resources and information about reliability of the metrics. Two run...
Article
Full-text available
This paper exposes the Rapid Dialogue Prototyping Methodology [1, 2, 3], a methodology allowing the easy and automatic derivation of an ad hoc dialogue management system from a specific task description. The goal of the produced manager is to provide the user with a dialogue based interface to easily perform the target task. In addition, reset patt...
Conference Paper
Full-text available
This paper describes a multimodal dialogue driven system, ARCHIVUS, that allows users to access and retrieve the content of recorded and annotated multimodal meetings. We describe (1) a novel approach taken in designing the system given the relative inapplicability of standard user requirements elicitation methodologies, (2) the components of ARCHI...
Conference Paper
CESTA, the first European Campaign dedicated to MT Evaluation, is a project labelled by the French Technolangue action. CESTA provides an evaluation of six commercial and academic MT systems using a protocol set by an international panel of experts. CESTA aims at producing reusable resources and information about reliability of the metrics. Two run...
Chapter
Document processing and visualization was identified as one of the key topics in the domain of Text Mining. For this reason, the WG1 working group was created in the NEMIS project. In the areas it covers, this working group contributes to the production of a roadmap for follow-up research and technological development in text mining, which is the o...
Article
Full-text available
Peer-to-Peer (P2P) systems are very large computer networks, where peers collaborate to provide a common service. Providing large-scale Information Retrieval (IR), e.g. for searching the Word Wide Web, is an appealing application for P2P systems. The research community has presented several proposal for P2P-IR. However, so far the concepts of P2P a...