• Home
  • Yahoo
  • Laboratory of Haifa, Israel
  • Yoelle Maarek
Yoelle Maarek

Yoelle Maarek
  • PhD, CS Technion
  • VP of Research at Yahoo

About

106
Publications
17,831
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
3,422
Citations
Current institution
Yahoo
Current position
  • VP of Research
Additional affiliations
June 2009 - January 2017
Yahoo
Position
  • VP of Research
June 1989 - present
Yahoo! Labs, Israel
Position
  • Yahoo! Answers Research, Yahoo! Mail Research
Education
September 1985 - January 1989
Technion – Israel Institute of Technology
Field of study
  • Computer Science
September 1984 - August 1985
Sorbonne University
Field of study
  • Computer Science
September 1982 - June 1985
Ecole Nationale des Ponts et Chaussees (ENPC)
Field of study
  • CS and Applied Maths

Publications

Publications (106)
Conference Paper
Full-text available
Alexa is an intelligent personal assistant developed by Amazon, that can provide many services through voice interaction such as music playback, news, question-answering, and on-line shopping. The Alexa shopping research team in Amazon is a new emerging group of scientists who investigate revolutionary shopping experience through Alexa, while devis...
Article
This paper presents a generic Bayesian framework that enables any deep learning model to actively learn from targeted crowds. Our framework inherits from recent advances in Bayesian deep learning, and extends existing work by considering the targeted crowdsourcing approach, where multiple annotators with unknown expertise contribute an uncontrolled...
Preprint
Full-text available
This paper presents a generic Bayesian framework that enables any deep learning model to actively learn from targeted crowds. Our framework inherits from recent advances in Bayesian deep learning, and extends existing work by considering the targeted crowdsourcing approach, where multiple annotators with unknown expertise contribute an uncontrolled...
Conference Paper
Recent research studies on mail search have shown that the longer the query, the better the quality of results, yet a majority of mail queries remain very short and searchers struggle with formulating queries. A known mechanism to assist users in this task is query auto-completion, which has been highly successful in Web search, where it leverages...
Conference Paper
Web Mail has significantly changed in the last decade. It keeps growing with 90% of its traffic being generated by automated scripts or "machines", [1]. At the same time, major mail services offer more and more free storage, ranging from 15GB for Gmail and Outlook.com to 1TB for Yahoo mail. As a result, we keep accumulating messages in our inbox, r...
Conference Paper
Mail search has traditionally served time-ranked results, even if it has been shown that relevance ranking provides higher retrieval quality on average. Some Web mail services have recently started to provide relevance ranking options such as the relevance toggle in the search results page of Yahoo Mail, or the ``top results" section in Inbox by Gm...
Conference Paper
Web mail search is an emerging topic, which has not been the object of as many studies as traditional Web search. In particular, little is known about the characteristics of mail searchers and of the queries they issue. We study here the characteristics of Web mail searchers, and explore how demographic signals such as location, age, gender, and in...
Conference Paper
Many have noticed that personal communications have slowly moved from mail to social media and instant messaging platforms, especially with younger generation [6]. Yet Web Mail traffic continues to steadily grow. A paradox? Not really. We have observed at Yahoo Research that the nature of email traffic has significantly changed in the last two deca...
Conference Paper
Several recent studies have presented different approaches for clustering and classifying machine-generated mail based on email headers. We propose to expand these approaches by considering email message bodies. We argue that our approach can help increase coverage and precision in several tasks, and is especially critical for mail extraction. We r...
Conference Paper
In this paper, we study multi-click queries - queries for which more than one click is performed by the same user within the same query session. Such queries may reflect a more complex information need, which leads the user to examine a variety of results. We present a comprehensive analysis that reveals unique characteristics of multi-click querie...
Article
Email classification is still a mostly manual task. Consequently, most Web mail users never define a single folder. Recently however, automatic classification offering the same categories to all users has started to appear in some Web mail clients, such as AOL or Gmail. We adopt this approach, rather than previous (unsuccessful) personalized approa...
Conference Paper
With email traffic increasing, leading Web mail services have started to offer features that assist users in reading and processing their inboxes. One approach is to identify "important" messages, while a complementary one is to bundle messages, especially machine-generated ones, in pre-defined categories. We rather propose here to go back to the t...
Conference Paper
We study the problem of k-anonymization of mail messages in the realistic scenario of auditing mail traffic in a major commercial Web mail service. Mail auditing is necessary in various Web mail debugging and quality assurance activities, such as anti-spam or the qualitative evaluation of novel mail features. It is conducted by trained professional...
Conference Paper
The nature of Web mail traffic has significantly evolved in the last two decades, and consequently the behavior of Web mail users has also changed. For instance a recent study conducted by Yahoo Labs showed that today 90% of Web mail traffic is machine-generated. This partly explains why email traffic continues to grow even if a significant amount...
Conference Paper
Full-text available
With Web mail services offering larger and larger storage capacity, most users do not feel the need to systematically delete messages anymore and inboxes keep growing. It is quite surprising that in spite of the huge progress of relevance ranking in Web Search, mail search results are still typically ranked by date. This can probably be explained b...
Conference Paper
The majority of Web email is known to be generated by machines even when one excludes spam. Many machine-generated email messages such as invoices or travel itineraries are critical to users. Recent research studies establish that causality relations between certain types of machine-generated email messages exist and can be mined. These relations e...
Conference Paper
Email classification is still a mostly manual task. Consequently, most Web mail users never define a single folder. Recently however, automatic classification offering the same categories to all users has started to appear in some Web mail clients, such as AOL or Gmail. We adopt this approach, rather than previous (unsuccessful) personalized approa...
Patent
Full-text available
A system and method for identifying causal email threading. In one aspect, a computing device identifies a plurality of email templates, each email template corresponding to characteristics of a received machine-generated email, the characteristics of the received machine-generated email relating to static data of the machine-generated email. The c...
Patent
A system and method for assigning one or more tags to an image file. In one aspect, a server computer receives an image file captured by a client device. In one embodiment, the image file includes an audio component embedded therein by the client device, where the audio component was spoken by a user of the client device as a tag of the image file....
Patent
Full-text available
A system and method is described herein that automatically determines if a user of a search engine is conducting a research mission and then provides one or more research tools, one or more specialized searches, one or more directed ads, and/or one or more marketplace events responsive to determining that the research mission is being conducted. Th...
Conference Paper
In spite of personal communications moving more and more towards social and mobile, especially with younger generations, email traffic continues to grow. This growth is mostly attributed to (non-spam) machine-generated email, which, against common perception, is often extremely valuable. Indeed, together with monthly newsletters that can easily be...
Conference Paper
All askers who post questions in Community-based Question Answering (CQA) sites such as Yahoo! Answers, Quora or Baidu’s Zhidao, expect to receive an answer, and are frustrated when their questions remain unanswered. We propose to provide a type of “heads up” to askers by predicting how many answers, if at all, they will get. Giving a preemptive wa...
Patent
Full-text available
Embodiments are directed towards identifying auto-folder tags for messages by using a combinational optimization approach of bi-clustering folder names and features of messages based on relationship strengths. The combinational optimization approach of bi-clustering, generally, groups a plurality of folder names and a plurality of features into one...
Conference Paper
In Web search, users may remain unsatisfied for several reasons: the search engine may not be effective enough or the query might not reflect their intent. Years of research focused on providing the best user experience for the data available to the search engine. However, little has been done to address the cases in which relevant content for the...
Conference Paper
What makes a good question recommendation system for community question-answering sites? First, to maintain the health of the ecosystem, it needs to be designed around answerers, rather than exclusively for askers. Next, it needs to scale to many questions and users, and be fast enough to route a newly-posted question to potential answerers within...
Article
Web Search, which takes its root in the mature field of information retrieval, evolved tremendously over the last 15 years. The field encountered its first revolution when it started to deal with huge amounts of Web pages. Then, a major step was accomplished when engines started to consider the structure of the Web graph and leveraged link analysis...
Conference Paper
Viewing email messages as parts of a sequence or a thread is a convenient way to quickly understand their context. Current threading techniques rely on purely syntactic methods, matching sender information, subject line, and reply/forward prefixes. As such, they are mostly limited to personal conversations. In contrast, machine-generated email, whi...
Conference Paper
Full-text available
Internet users notoriously take an assumed identity or masquerade as someone else, for reasons such as financial profit or social benefit. But often the converse is also observed, where people choose to reveal true features of their identity, including deeply intimate details. This work attempts to explore several of the conditions that allow this...
Article
While Web search has become increasingly effective over the last decade, for many users' needs the required answers may be spread across many documents, or may not exist on the Web at all. Yet, many of these needs could be addressed by asking people via popular Community Question Answering (CQA) services, such as Baidu Knows, Quora, or Yahoo! Answe...
Conference Paper
Web Search, which takes its root in the mature field of information retrieval, evolved tremendously over the last 20 years. The field encountered its first revolution when it started to deal with huge amounts of Web pages. Then, a major step was accomplished when engines started to consider the structure of the Web graph and link analysis became a...
Conference Paper
Community-based Question Answering sites, such as Yahoo! Answers or Baidu Zhidao, allow users to get answers to complex, detailed and personal questions from other users. However, since answering a question depends on the ability and willingness of users to address the asker's needs, a significant fraction of the questions remain unanswered. We mea...
Conference Paper
Full-text available
Most email applications devote a significant part of their real estate to organization mechanisms such as folders. Yet, we verified on the Yahoo! Mail service that 70% of email users have never defined a single folder. This implies that one of the most well known email features is underexploited. We propose here to revive the feature by providing a...
Conference Paper
Full-text available
Yahoo! Answers is currently one of the most popular question answering systems. We claim however that its user experience could be significantly improved if it could route the "right question" to the "right user." Indeed, while some users would rush answering a question such as "what should I wear at the prom?," others would be upset simply being e...
Conference Paper
Full-text available
Community-based Question Answering (CQA) sites, such as Yahoo! Answers, Baidu Knows, Naver, and Quora, have been rapidly growing in popularity. The resulting archives of posted answers to questions, in Yahoo! Answers alone, already exceed in size 1 billion, and are aggressively indexed by web search engines. In fact, a large number of search engine...
Conference Paper
Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard documentcentric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis techniques in both crawling and ranking challenges. A more recent, no less important but maybe m...
Conference Paper
Full-text available
The ability to aggregate huge volumes of queries over a large population of users allows search engines to build precise models for a variety of query-assistance features such as query recommendation, correction, etc. Yet, no matter how much data is aggregated, the long-tail distribution implies that a large fraction of queries are rare. As a resul...
Conference Paper
Web retrieval methods have evolved through three major steps in the last decade or so. They started from standard document-centric IR in the early days of the Web, then made a major step forward by leveraging the structure of the Web, using link analysis techniques in both crawling and ranking challenges. A more recent, no less important but maybe...
Conference Paper
Users have taken a more and more central role in the Web. Their role is both explicit, as they become more savvy, they have more expectations, and new interactive features keep appearing, and implicit, as their actions are monitored at various levels of granularity for various needs from live traffic evaluation for usage data mining to improve rank...
Article
The vast heterogeneous network that is the World Wide Web requires common languages to facilitate the exchange and display of data and information in many forms. The Word Wide Web Consortium (W3C) developed the extensible markup language (XML) for this purpose. XML documents are produced automatically by applications or manually by users. When user...
Conference Paper
Full-text available
Addressing user's information needs has been one of the main goals of Web search engines since their early days. In some cases, users cannot see their needs immediately answered by search results, simply because these needs are too complex and involve multiple aspects that are not covered by a single Web or search results page. This typically happe...
Conference Paper
The classic Web search experience, consisting of returning “ten blue links” in response to a short user query, is powered today by a mature technology where progress has become incremental and expensive. Furthermore, the “ten blue links” represent only a fractional part of the total Web search experience: today, what users expect and receive in res...
Article
Full-text available
We initiate the study of a new clustering framework, called cluster ranking. Rather than simply partitioning a network into clusters, a cluster ranking algorithm also orders the clusters by their strength. To this end, we introduce a novel strength measure for clusters—the integrated cohesion—which is applicable to arbitrary weighted networks. We t...
Conference Paper
Searching and browsing are the two basic information discovery paradigms, since the early days of the Web. After more than ten years down the road, three schools seem to have emerged: (1) The search-centric school argues that guided navigation is superfluous since free form search has become so good and the search UI so common, that users can satis...
Chapter
Peer-to-peer (P2P) technology has spread through the Web over the last few years through several incarnations, ranging from search for extraterrestrial intelligence to multimedia file sharing, or resource sharing in the general sense. Collaboration systems, expert finding systems and recommender systems are all examples of resource sharing tools wh...
Article
The morning session was dedicated to the third edition in the series of XML and Information Retrieval workshops that were held at SIGIR'2000 (Athens, Greece, see SIGIR Forum Fall 2000 issue) and SIGIR'2002 (Tampere, Finland, see SIGIR Forum Fall 2002 issue). The goal of the workshop, co-chaired by Baeza-Yates and Maarek, was to complement the INEX...
Conference Paper
Full-text available
We describe a novel application of the Web services model for end-user information discovery needs rather than for the traditional business-to-business applications. We describe a specialization of Web services for information providers and demonstrate, through an exemplary unified information discovery console, how consumers can easily customize t...
Conference Paper
With the advent of the web there has been a great demand for data interchange between existing applications using internet infrastructure and also between newer web services applications. The W3C XML standard is becoming the internet data interchange format, even though the initial XML standard was not well suited to this. The XML Schema recommenda...
Conference Paper
Most of the work on XML query and search has stemmed from the publishing and database communities, mostly for the needs of business applications. Recently, the Information Retrieval community began investigating the XML search issue to answer information discovery needs. Following this trend, we present here an approach where information needs can...
Conference Paper
With the advent of the web there has been a great demand for data interchange between existing applications using internet infrastructure and also between newer web services applications. The W3C XML standard is becoming the internet data interchange format. Such XML data is typically produced by applications. However during application development...
Article
The previous workshop on "XML and Information Retrieval" was held in the context of SIGIR'2000 (Athens, Greece) and showed that there is a serious interest in managing semi-structured data from an IR (i.e., unstructured) perspective rather than from the dominating database (i.e., structured) perspective. As a direct outcome of the workshop, a speci...
Article
Full-text available
We propose simplifying the editing of structured documents that conform to Backus-Naur Form (BNF) grammars, by exposing and operating on the grammar itself. We introduce an original grammar view that supports browsing and editing of structured documents and is coordinated with the document view. The grammar view presents document element types in c...
Conference Paper
In spite of the increase in the availability of mobile devices in the last few years, Web information is not yet as accessible from PDAs or WAP phones as it is from the desktop. In this paper, we propose a solution for supporting one of the most popular information discovery mechanisms, namely Web directory navigation, from mobile devices. Our prop...
Article
The advances in storage and communications enable users to store massive amounts of data, and to share it seamlessly with their peers. With the advent of XML, we expect a significant portion of this data to be in XML format. We describe here the architecture and implementation of an XML repository that promotes a novel navigation paradigm for XML d...
Article
With the increasing proliferation of chat applications on the web, the old vision of “adding people” to the web is becoming a reality. Along with collaboration tools, more and more sites offer people awareness mechanisms to let the site visitors know about each other. This reflects the dual nature of the web as a place for virtual meetings as well...
Article
Full-text available
Mobile knowledge seekers often need access to information on the Web during a meeting or on the road, while away from their desktop. A common practice today is to use pervasive devices such as Personal Digital Assistants or mobile phones. However, these devices have inherent constraints (e.g., slow communication, form factor) which often make infor...
Conference Paper
Full-text available
XML documents represent a middle range between unstructured data such as textual documents and fully structured data encoded in databases. Typically, information retrieval techniques are used to support search on the "unstructured" end of this scale, while database techniques are used for the other end. To date, most of the work on XML query and se...
Article
Full-text available
To date, most of the work on XML query and search has stemmed from the document management and database communities and from the information needs of business applications, as evidenced by existing XML query languages such as W3C's XQuery, which is strongly inspired by SQL. We propose here to extend the realm of XML by supporting the information ne...
Conference Paper
Full-text available
We introduce static index pruning methods that significantly reduce the index size in information retrieval systems.We investigate uniform and term-based methods that each remove selected entries from the index and yet have only a minor effect on retrieval results. In uniform pruning, there is a fixed cutoff threshold, and all index entries whose c...
Article
XML - the eXtensible Markup Language has recently emerged as a new standard for data representation and exchange on the Interact. It is believed that it will become a universal format for data exchange on the Web and that in the near future we will find vast amounts of documents in XML format on the Web. As a result, it has become crucial to addres...
Article
Full-text available
Although software reuse presents clear advantages for programmer productivity and code reliability, it is not practiced enough. One of the reasons for the only moderate success of reuse is the lack of software libraries that facilitate the actual locating and understanding of reusable components. This paper describes a technology for automatically...
Conference Paper
Full-text available
. In this work we describe a new approach for morphological disambiguation to enable linguistic indexing for Hebrew search systems. We describe a Hebrew Morphological Disambiguator (HMD or Hemed for short) based on statistical data gathered from large Hebrew corpora. We show how to integrate HMD with a search engine to enable linguistic search for...
Article
Full-text available
With the increasing proliferation of chat applications on the Web, the old vision of "adding people" to the Web, is becoming a reality. While infrastructure seems to be scalable and stable enough to support collaboration, the user model is not well defined yet. In particular, there seems to be a certain lack of abstraction and granularity in existi...
Article
Full-text available
We revisit document clustering in the context of the Web. Specifically, we investigate on-line ephemeral clustering, whereby the input document set is generated dynamically, typically by search results, and the output clustering hierarchy has a short life span, and is used for interactive browsing purposes. Ephemeral clustering for interactive use...
Article
Full-text available
XML - the eXtensible Markup Language has recently emerged as a new standard for data representation and exchange on the Interact. It is believed that it will become a universal format for data exchange on the Web and that in the near future we will find vast amounts of documents in XML format on the Web. As a result, it has become crucial to addres...
Conference Paper
Full-text available
Information retrieval systems typically weight the importance of search terms according to document and collection statistics (such as by using tf \Theta idf scores, where less common terms are given higher weight). We consider here the scenario where a user can express her own subjective weighting of the importance of the terms that form the query...
Article
This paper proposes two enhancements to existing search services over the Web. One enhancement is the addition of limited dynamic search around results provided by regular Web search services, in order to correct part of the discrepancy between the actual Web and its static image as stored in search repositories. The second enhancement is an experi...
Article
Full-text available
: We give a principled method for allowing users to assign subjective weights to the importance of search terms, that is, the terms forming a query, in information retrieval systems. For example, our method makes it possible for a user to say that she cares twice as much about the first search term as the second search term, and to obtain a ranked...
Conference Paper
This paper introduces the “shark search” algorithm, a refined version of one of the first dynamic Web search algorithms, the “fish search”. The shark-search has been embodied into a dynamic Web site mapping that enables users to tailor Web maps to their interests. Preliminary experiments show significant improvements over the original fish-search a...
Article
This paper introduces the “shark search” algorithm, a refined version of one of the first dynamic Web search algorithms, the “fish search”. The shark-search has been embodied into a dynamic Web site mapping that enables users to tailor Web maps to their interests. Preliminary experiments show significant improvements over the original fish-search a...
Article
Conventional information discovery tools can be classified as being either search oriented or browse oriented. In the context of the Web, search-oriented tools employ text-analysis techniques to find Web documents based on user-specified queries, whereas browse-oriented ones employ site mapping and visualization techniques to allow users to navigat...
Article
The explosive growth in the Web leads to the need for personalized client-based local URL repositories often called bookmarks. We present a novel approach to bookmark organization that provides automatic classification together with user adaption.
Article
With the advent of digital libraries and of wide area networks, enormous amounts of textual information are made available all over the world: A typical example being the World Wide Web on the Internet. Searching and browsing are the two resource discovery paradigms mostly used to access this information [Bowman 94]. Information retrieval (IR) prov...
Conference Paper
New techniques for browsing amongst functionally related classes, and retrieving classes from objectoriented class libraries are presented. These techniques make use of two potent, and rea.dily available sources of information: the source code of each class, and its associated documentation. We describe how the integration of information retrieval...
Article
The two basic requirements for achieving software reuse are: (1) to provide a sufficient number of components over a spectrum of domains that can be reused as is (black-box reuse) or easily adapted (white-box reuse), and (2) to organize components such that code close to the users' needs is easy to locate. Many attempts have been made at addressing...
Article
Full-text available
A technology for automatically assembling large software libraries which promote software reuse by helping the user locate the components closest to her/his needs is described. Software libraries are automatically assembled from a set of unorganized components by using information retrieval techniques. The construction of the library is done in two...
Article
One purported advantage of hypertext systems is the ability to move between semantically related parts of a document (or family of documents). If the document is undergoing frequent modification (for example while an author is writing a book or while a software design stored in the hypertext system is evolving) the question arises as to how to incr...
Conference Paper
Cluster analysis has been of long-standing interest in statistics. It can be traced to the work of Adanson in 1757 [Adanson 1757] who used numerical clustering for classifying botanic species. Statisticians and more particularly taxonomists have widely developed the field since then. Cluster analysis offers now a large range of techniques for ident...
Article
In contrast to other kinds of libraries, software libraries need to be conceptually organized. When looking for a component, the main concern of users is the functionality of the desired component; implementation details are secondary. Software reuse would be enhanced with conceptually organized large libraries of software components. In this paper...
Conference Paper
With the ever-increasing size and complexity of software systems, their maintenance becomes a more and more difficult issue. Therefore, classical managerial solutions cannot be applied for maintaining very large software systems. The maintenance task must be assisted by automated techniques. Most existing tools can assist maintenance tasks only by...

Network

Cited By