Gilad Mishne

Gilad Mishne
  • data at color

About

76
Publications
31,587
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
5,550
Citations
Current institution
color
Current position
  • data

Publications

Publications (76)
Article
Full-text available
Advances in genome sequencing have led to a tremendous increase in the discovery of novel missense variants, but evidence for determining clinical significance can be limited or conflicting. Here, we present LEAP, a machine learning model that utilizes a variety of feature categories to classify variants, and achieves high performance in multiple g...
Article
Full-text available
Background: Inherited susceptibility to common, complex diseases may be caused by rare, pathogenic variants ("monogenic") or by the cumulative effect of numerous common variants ("polygenic"). Comprehensive genome interpretation should enable assessment for both monogenic and polygenic components of inherited risk. The traditional approach require...
Preprint
Full-text available
Background: The inherited susceptibility of common, complex diseases may be caused by rare, 'monogenic' pathogenic variants or by the cumulative effect of numerous common, 'polygenic' variants. As such, comprehensive genome interpretation could involve two distinct genetic testing technologies -- high coverage next generation sequencing for known g...
Article
Full-text available
Next generation sequencing multi-gene panels have greatly improved the diagnostic yield and cost effectiveness of genetic testing and are rapidly being integrated into the clinic for hereditary cancer risk. With this technology comes a dramatic increase in the volume, type and complexity of data. This invaluable data though is too often buried or i...
Article
Full-text available
Background: Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS...
Patent
Full-text available
In one embodiment, access a set of recency ranking data comprising one or more recency search queries and one or more recency search results, each of the recency search queries being recency-sensitive with respect to a particular time period and being associated with a query timestamp representing the time at which the recency search query is recei...
Patent
Full-text available
Disclosed are methods and apparatus for clustering and presenting search suggestions. A segment of text is obtained via a search query section of a user interface, the segment of text being a portion of a search query. A set of suggestions is obtained, each suggestion in the set of suggestions being a suggested search query relating to the segment...
Article
This is a summary of the keynote talk delivered at ECIR 2014. Twitter’s search engine faces some of the most unique challenges in information retrieval and distributed systems today. On the scaling front, it’s a relatively young system with a massive user base, billions of queries daily, and many billions of indexed documents - with thousands being...
Article
We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper pro...
Article
It is well known that anchor text plays an important role in search, providing signals that are often not present in the source document itself. The paper reports results of a preliminary investigation on the value of tweets and tweet conversations as anchor text. We show that using tweets as anchors improves significantly over using HTML anchors,...
Article
The real-time nature of Twitter means that term distributions in tweets and in search queries change rapidly: the most frequent terms in one hour may look very different from those in the next. Informally, we call this phenomenon "churn". Our interest in analyzing churn stems from the perspective of real-time search. Nearly all ranking functions, m...
Article
User browsing information, particularly non-search-related activity, reveals important contextual information on the preferences and intents of Web users. In this article, we demonstrate the importance of mining general Web user behavior data to improve ranking and other Web-search experience, with an emphasis on analyzing individual user sessions...
Conference Paper
All state-of-the-art web search engines implement an auto-completion mechanism - an assistive technology enabling users to effectively formulate their search queries by predicting the next characters or words that they are likely to type. Query completions (or suggestions) are typically mined from past user interactions with the search engine, e.g....
Conference Paper
Full-text available
In web search, recency ranking refers to ranking documents by rel- evance which takes freshness into account. In this paper, we pro- pose a retrieval system which automatically detects and responds to recency sensitive queries. The system detects recency sensitive queries using a high precision classifier. The system responds to re- cency sensitive...
Conference Paper
We describe improvements to the use of semantic lexicons by a state-of-the-art query interpretation system powering a major search engine. We successfully compute concept label importance information for lexicon strings; lexicon augmentation with such information leads to a 6.4% precision increase on affected queries with no query coverage loss. Fi...
Conference Paper
Most existing information retrieval (IR) systems do not take much advantage of natural language processing (NLP) tech- niques due to the complexity and limited observed effectiveness of applying NLP to IR. In this paper, we demonstrate that substantial gains can be obtained over a strong baseline using NLP techniques, if properly handled. We propos...
Conference Paper
User browsing information, particularly their non-search re- lated activity, reveals important contextual information on the preferences and the intent of web users. In this paper, we expand the use of browsing information for web search rank- ing and other applications, with an emphasis on analyzing individual user sessions for creating aggregate...
Conference Paper
Full-text available
The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content sites based on user contributions --social media sites -- becomes increasingly important. Social media in general exhibit a rich variety of information sources: in add...
Article
Full-text available
The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content in sites based on user contributions—social media sites—becomes increasingly important. Social media in general exhibit a rich variety of information sources: in addit...
Article
This paper describes three simple heuristics which improve opinion retrieval effectiveness by using blog-specific proper-ties. Blog timestamps are used to increase the retrieval scores of blog posts published near the time of a significant event related to a query; an inexpensive approach to comment amount estimation is used to identify the level o...
Article
We demonstrate the next release of MoodViews, a set on- line tools mood analysis in blogs. Since its initial launch in mid-2005, MoodViews has provided a window into aggre- gate states of mind of masses of people. In addition to the tracking functionalities that MoodViews has oered so far, we demonstrate several types of mood-related search tools....
Article
Proefschrift Universiteit van Amsterdam. Met lit.opg. en een samenvatting in het Nederlands.
Conference Paper
Access to weblogs, both through commercial services and in academic studies, is usually limited to the content of the weblog posts. This overlooks an important aspect distin- guishing weblogs from other web pages: the ability of weblog readers to respond to posts directly, by posting comments. In this paper we present a large-scale study of weblog...
Conference Paper
We present an analysis of a large blog search engine query log, exploring a number of angles such as query intent, query topics, and user sessions. Our results show that blog searches have different intents than general web searches, suggesting that the primary targets of blog searchers are tracking references to named entities, and locating blogs...
Conference Paper
Full-text available
We describe a method for discovering ir- regularities in temporal mood patterns ap- pearing in a large corpus of blog posts, and labeling them with a natural language explanation. Simple techniques based on comparing corpus frequencies, coupled with large quantities of data, are shown to be effective for identifying the events un- derlying changes...
Conference Paper
This paper describes AutoTag, a tool which suggests tags for weblog posts using collaborative filtering methods. An evaluation of AutoTag on a large collection of posts shows good accuracy; coupled with the blogger's final quality con- trol, AutoTag assists both in simplifying the tagging process and in improving its quality.
Conference Paper
We use a combination of text analysis and external knowl- edge sources to estimate the commercial taste of bloggers from their text; our methods are evaluated using product wishlists found in the blogs. Initial results are promising, showing that valuable insights can be mined from blogs, not just at the aggregate but also at the individual blog le...
Conference Paper
We demonstrate a system for tracking and analyzing moods of bloggers worldwide, as reflected in the largest blogging community, LiveJournal. Our system collects thousands of blog posts every hour, performs various analyses on the posts and presents the results graphically. Copyright © 2006, American Association for Artificial Intelligence (www.aaai...
Conference Paper
The volume of discussion about a product in weblogs has re- cently been shown to correlate with the product's financial performance. In this paper, we study whether applying senti- ment analysis methods to weblog data results in better corre- lation than volume only, in the domain of movies. Our main finding is that positive sentiment is indeed a b...
Conference Paper
The personal, diary-like nature of blogs prompts many blog- gers to indicate their mood at the time of posting. Aggregat- ing these indications over a large amount of bloggers gives a "blogosphere state-of-mind" for each point in time: the inten- sity of different moods among bloggers at that time. In this paper, we address the task of estimating t...
Conference Paper
We introduce a method for content-based advertisement se- lection for personal blog pages, based on combining multiple represen- tations of the blog. The core idea behind the method is that personal blogs represent individuals, whose interests can be modeled by the lan- guage used in the blog itself combined with the language used in related source...
Article
This position paper discusses the blogging phe-nomenon from an information access point of view. We examine blogs as a source of knowledge, analyzing the properties which make them unique as a data collection. We outline information analysis tasks aimed at blogs, and discuss how the properties of blogs are used in this context; finally, we point ou...
Article
We describe our participation in the Opinion Retrieval task at TREC 2006. Our approach to identifying opinions in blog post consisted of scoring the posts separately on vari- ous aspects associated with an expression of opinion about a topic, including shallow sentiment analysis, spam detection, and link-based authority estimation. The separate app...
Conference Paper
Full-text available
We describe a method for discovering irregularities in temporal mood patterns appearing in a large corpus of blog posts, and labeling them with a natural language explanation. Simple techniques based on comparing corpus frequencies, coupled with large quantities of data, are shown to be effective for identifying the events underlying changes in glo...
Conference Paper
Full-text available
We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the com- ments. In contrast to other link spam filtering approaches, our method requires no training, no hard-coded rule sets, and no knowledge of complete-web connectivity. Prelimi- nary ex...
Conference Paper
We explore the use of phrase and proximity terms in the con- text of web retrieval, which is dierent from traditional ad-hoc retrieval both in document structure and in query characteristics. We show that for this type of task, the usage of both phrase and proximity terms is highly beneficial for early precision as well as for overall retrieval eec...
Conference Paper
We examine the eects of various query modifications on the problem of answer projection — the task of retrieving documents that support a given answer to a question. We compare dierent techniques such as phrase searches and term weighting, and show that some models achieve significant improvements over unmodified queries.
Article
We propose a method for ranking short information nuggets extracted from a text corpus, using another, reliable refer- ence corpus as a user model. We argue that the availability and usage of such additional corpora is common in a number of IR tasks, and apply the method to answering a form of definition questions. The proposed ranking method makes...
Conference Paper
Full-text available
The reasoning tasks that can be performed with semantic web service descriptions depend on the quality of the domain ontologies used to create these descriptions. However, building such domain ontologies is a time consuming and difficult task.We describe an automatic extraction method that learns domain ontologies for web service descriptions from...
Conference Paper
Full-text available
We describe a system for automating call-center analysis and monitoring. Our system integrates transcription of incoming calls with analysis of their content; for the analysis, we introduce a novel method of estimating the domain-specific importance of conversation fragments, based on divergence of corpus statistics. Combining this method with Info...
Article
We present preliminary work on classifying blog text ac-cording to the mood reported by its author during the writ-ing. Our data consists of a large collection of blog posts – online diary entries – which include an indication of the writer's mood. We obtain modest, but consistent improve-ments over a baseline; our results show that further increas...
Conference Paper
The paper describes the University of Amsterdam’s participation in the Question Answering track at CLEF2003, our system and the results produced by it. A thorough analysis of the wrong answers given by our system is provided, including a discussion of each type of error and possible strategies for handling them. We outline our current efforts for i...
Conference Paper
Full-text available
We describe the participation of the University of Amsterdam in the Question Answering track at CLEF 2004.We took part in the monolingual Dutch task and, for the first time, also in the bilingual English to Dutch task. This year?s system is a further elaboration and refinement of the multi-stream architecture we introduced last year, extended with...
Article
Full-text available
In the context of the European Network of Excellence in Computational Logic (CoLogNet, http://www.colognet.org/), the European Association for Logic, Language and Computation (FoLLI, http://www.folli.org) has started a project on E-Learning in Computational Logic and the development of Dynamic Teaching Materials for its annual European Summer Schoo...
Article
Full-text available
We describe our participation in the TREC 2003 Question Answering track. We explain the ideas underlying our approaches to the task, report on our results, provide an error analysis, and give a summary of our findings so far.
Article
We propose a method for retrieving segments of source code from a large repository. The method is based on conceptual modeling of the code, combining information extracted from the structure of the code and standard information distance measures. Our results show an improvement over traditional retrieval models, indicating that, for this type of hi...
Article
Full-text available
It is generally believed that question answering can benefit from natural language processing methods. So far, however, there have been few systematic studies of this conjecture. We report on ongoing work that is aimed at understanding the contribution of linguistically informed modules and resources to the overall performance of a generic question...
Conference Paper
Full-text available
We describe our participation in the TREC 2004 Question Answering track. We provide a detailed account of the ideas underlying our approach to the QA task, especially to the so-called "other" questions. This year we made essential use of Wikipedia, the free online encyclopedia, both as a source of answers to factoid questions and as an importance m...
Article
Full-text available
We describe our participation in the TREC 2004 Web, Terabyte, and Question Answering tracks. We provide a detailed account of the ideas underlying our approaches to these tasks, report on our results, and give a summary of our findings so far.
Article
This paper describes the official runs of our team for QA@CLEF 2003. We took part in the monolingual Dutch Question Answering task.
Conference Paper
The paper describes the University of Amsterdam's partici- pation in the Question Answering track at CLEF 2003, our system and the results produced by it. A thorough analysis of the wrong answers given by our system is provided, including a discussion of each type of error and possible strategies for handling them. We outline our current eorts for...
Article
This paper describes the official runs of our team for QA@CLEF 2003. We took part in the monolingual Dutch Question Answering task.
Article
We describe a framework for offline extraction of certain types of information from a document collection, and discuss its usage for answering factoid questions. We implemented this approach as a part of the Dutch Question Answering System developed at the University of Amsterdam. The evaluation of the system using data from the CLEF 2003 Question...
Article
We report on the construction of the first-ever open domain question answering system for the Dutch language. In addition to providing experimental results based on the CLEF 2003 QA test set for Dutch, we also identify a number of key natural language processing resources that are needed to further question answering for Dutch.
Article
Full-text available
This paper describes the official runs of our team for the CLEF 2004 question answering tasks. We took part in the monolingual Dutch task and in the bilingual English to Dutch task.
Article
ABSTRACT We use a combination of text analysis and external knowledge sources to estimate the commercial taste of bloggers from their text; our methods,are evaluated using product wishlists found in the blogs. Initial results are promising, showing that valuable insights can be mined from blogs, not just at the aggregate but also at the individual...

Network

Cited By