Steve Lawrence

Steve Lawrence
Google Inc. | Google · Research Department

About

154
Publications
50,064
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
23,748
Citations
Citations since 2016
0 Research Items
6656 Citations
201620172018201920202021202202004006008001,000
201620172018201920202021202202004006008001,000
201620172018201920202021202202004006008001,000
201620172018201920202021202202004006008001,000
Introduction
Steve Lawrence currently works at the Research Department, Google Inc.. Steve does research in Deep Learning, Artificial Intelligence, Data Mining, and Information Science.

Publications

Publications (154)
Patent
Full-text available
A system may obtain search results associated with a search performed using a search query. The system may modify the search results, if necessary, based at least in part on information associated with prior document accesses by a user and present the modified search results to the user. The modification of the search results might including adding...
Patent
Full-text available
A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other docume...
Patent
Full-text available
A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score.
Patent
Full-text available
A system may determine a document inception date associated with a document, generate a score for the document based, at least in part, on the document inception date, and rank the document with regard to at least one other document based, at least in part, on the score.
Patent
Full-text available
A method may include receiving a document and an initial score for the document; determining that there has been a decrease in a rate or quantity of new links that point to the document over time; classifying the document as stale in response to the determining; decreasing the initial score for the document, resulting in an updated score; and ranki...
Article
Full-text available
Feature selection aims to select the smallest subset of features for a specified level of performance. The optimal achievable classification performance on a feature subset is summarized by its Receiver Operating Curve (ROC). When infinite data is available, the Neyman- Pearson (NP) design procedure provides the most efficient way of obtaining this...
Article
Full-text available
The growth of Internet commerce has stimulated the use of collaborative filtering (CF) algorithms as recommender systems. Such systems leverage knowledge about the known preferences of multiple users to recommend items of interest to other users. CF methods have been harnessed to make recommendations about such items as web pages, movies, books, an...
Article
Full-text available
Recommender systems leverage product and community information to target products to consumers. Researchers have developed collaborative recommenders, content-based recommenders, and (largely ad-hoc) hybrid systems. We propose a unified probabilistic framework for merging collaborative and content-based recommendations. We extend Hofmann's [1999] a...
Patent
Full-text available
Ads are scored using, at least, user information and information associated with a user request, such as a search query or a document request. The scores may be used in determining whether to serve ads, how to serve ads, to order ads, to filter ads, etc. Items of user information, request-associated information, and/or ad information can be weighte...
Article
Full-text available
We exploit the redundancy and volume of information on the web to build a computerized player for the ABC TV game show 'Who Wants To Be A Millionaire?' The player consists of a question-answering module and a decision-making module. The question-answering module utilizes question transformation techniques, natural language parsing, multiple informa...
Article
this article appeared as Agichtein et al. [2001]. The authors acknowledge the NEC Research Institute, where a substantial part of this research was accomplished, as well as support from the National Science Foundation under Grants No. IIS-97-33880 and IIS-9817434. Authors' addresses: E. Agichtein (eugene@cs.columbia.edu); S. Lawrence (lawrence@necm...
Article
Full-text available
We introduce a method for learning to find documents on the Web that contain answers to a given natural language question. In our approach, questions are transformed into new queries aimed at maximizing the probability of retrieving answers from existing information retrieval systems. The method involves automatically learning phrase features for c...
Article
The World Wide Web provides a unprecedented opportunity to automatically analyze a large sample of interests and activity in the world. We discuss methods for extracting knowledge from the web by randomly sampling and analyzing hosts and pages, and by analyzing the link structure of the web and how links accumulate over time. A variety of interesti...
Conference Paper
Full-text available
A major obstacle to fully integrated deployment of many data mining algorithms is the assumption that data sits in a single table, even though most real-world databases have complex relational structures. We propose an integrated approach to statistical modelling from relational databases. We structure the search space based on "refinement graphs",...
Article
Full-text available
A major obstacle to fully integrated deployment of many data mining algorithms is the assumption that data sits in a single table, even though most real-world databases have complex relational structures. We propose an integrated approach to statistical modeling from relational databases. We structure the search space based on "refinement graphs",...
Article
Full-text available
The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique prope...
Conference Paper
Full-text available
This paper describes a method of mapping music into a semantic space that can be used for similarity measurement, classification, and music information retrieval. The value along each dimension of this anchor space is computed as the output from a pattern classifier which is trained to measure a particular semantic feature. In anchor space, distrib...
Conference Paper
Full-text available
Niche search engines offer an efficient alternative to traditional search engines when the results returned by general-purpose search engines do not provide a sufficient degree of relevance and when nontraditional search features are required. Niche search engines can take advantage of their domain of concentration to achieve higher relevance and o...
Article
INTRODUCTION Many natural language questions (e.g., "What is a hard disk") are submitted to search engines on the web every day, and an increasing number of search services on the web specifically target natural language questions. For example, AskJeeves (www.ask.com) uses databases of pre-compiled information, metasearching, and other proprietary...
Article
Niche Search Engines offer an efficient alternative to traditional search engines when the results returned by general-purpose search engines do not provide a sufficient degree of relevance and when nontraditional search features are required. Niche search engines can take advantage of their domain of concentration to achieve higher relevance and o...
Article
A user searching for documents' within a specific category using a general purpose search engine might have a difficult time finding valuable documents '. To improve category specific search, we show that a trained classifier can recognize pages of a specified category with high precision by using tex- tual content, text location, and HTML structur...
Article
Web search engines generally treat search requests in isolation. The results for a given query are identical, independent of the user, or the context in which the user made the request. Nextgeneration search engines will make increasing use of context information, either by using explicit or implicit context information from users, or by implementi...
Conference Paper
We analyze data from 52 online in-game sports betting markets (where betting is allowed continuously throughout a game), including 34 markets based on soccer (European football) games from the 2002 World Cup, and 18 basketball games from the 2002 USA National Basketball Association (NBA) championship. We show that prices on average approach the cor...
Conference Paper
Full-text available
Niche Search Engines offer an efficient alternative to traditional search engines when the results returned by general-purpose search engines do not provide a sufficient degree of relevance. By taking advantage of their domain of concentration they achieve higher relevance and offer enhanced features. We discuss a new niche search engine, eBizSearc...
Conference Paper
We exploit the redundancy and volume of information on the web to build a computerized player for the ABC TV game show "Who Wants To Be A Millionaire?". The player consists of a question-answering module and a decision-making module. The question-answering module utilizes question transformation techniques, natural language parsing, multiple inform...
Conference Paper
We analyze data from $52$ online in-game sports betting markets (where betting is allowed continuously throughout a game), including 34 markets based on soccer (European football) games from the 2002 World Cup, and 18 basketball games from the 2002 USA National Basketball Association (NBA) championship. We show that prices on average approach the c...
Conference Paper
The web contains a wealth of product reviews, but sifting through them is a daunting task. Ideally, an opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good). We begin by identifying the unique prope...
Conference Paper
Niche Search Engines offer an efficient alternative to traditional search engines when the results returned by general-purpose search engines do not provide a sufficient degree of relevance. By taking advantage of their domain of concentration they achieve higher relevance and offer enhanced features. We discuss a new niche search engine, eBizSearc...
Conference Paper
Full-text available
The susceptibility of the Internet to random faults, malicious attacks, and mixtures of faults and attacks are analyzed. We analyze actual Internet data, as well as simulated data created with network models. The network models generalize previous research, and allow generation of graphs ranging from uniform to preferential, and from static to dyna...
Chapter
Referee helps to advance the state of the art in recommender systems, especially those that attempt to combine collaborative and content-based similarity information. It provides useful tools and an open standard for the rapid development and evaluation of recommender systems. It also uses metrics that evaluate how the recommender system affects us...
Article
Full-text available
We create a statistical model for inferring hierarchical term relationships about a topic, given only a small set of example web pages on the topic, without prior knowledge of any hierarchical information. The model can utilize either the full text of the pages in the cluster or the context of links to the pages. To support the model, we use "groun...
Article
Full-text available
We de ne a community on the web as a set of sites that have more links (in either direction) to members of the community than to non-members. Members of such a community can be eciently identi ed in a maximum ow / minimum cut framework, where the source is composed of known members, and the sink consists of well-known non-members. A focused crawler...
Article
Full-text available
Automated recommendation (e.g., personalized product recommendation on an ecommerce web site) is an increasingly valuable service associated with many databases---typically online retail catalogs and web logs. Currently, a major obstacle for evaluating recommendation algorithms is the lack of any standard, public, real-world testbed appropriate for...
Article
Full-text available
Inductive logic programming (ILP) techniques are useful for analyzing data in multi-table relational databases. Learned rules can potentially discover relationships that are not obvious in ``flattened'' data. Statistical learners, on the other hand, are generally not constructed to search relational data, they expect to be presented with a single t...
Article
Full-text available
As a whole, the World Wide Web displays a striking "rich get richer" behavior, with a relatively small number of sites receiving a disproportionately large share of hyperlink references and traffic. However, hidden in this skewed global distribution, we discover a qualitatively different and considerably less biased link distribution among subcateg...
Article
Full-text available
When searching the WWW, users often desire results restricted to a particular document category. Ideally, a user would be able to filter results with a text classifier to minimize false positive results; however, current search engines allow only simple query modifications. To automate the process of generating effective query modifications, we int...
Article
Full-text available
The vast improvement in information access is not the only advantage resulting from the increasing percentage of hyperlinked human knowledge available on the Web. Additionally, much potential exists for analyzing interests and relationships within science and society. However, the Web's decentralized and unorganized nature hampers content analysis....
Article
Full-text available
The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text in citing documents near the citation, for classifi...
Article
Full-text available
We demonstrate the artist detection component of Minnowmatch, a machine listening and music retrieval engine. Minnowmatch (Mima) automatically determines various metadata and makes classifications concerning a piece of audio using neural networks and support vector machines. The technologies developed in Minnowmatch may be used to create audio info...
Conference Paper
Full-text available
It would be interesting and valuable to devise an automatic measure of the similarity between two musicians based only on an analysis of their recordings. To develop such a mea- sure, however, presupposes some 'ground truth' training data describing the actual similarity between certain pairs of artists that constitute the desired output of the mea...
Conference Paper
CiteSeer (also known as ResearchIndex) is a digital library of scientific literature that aims to improve communication and progress in science. CiteSeer features include automatic metadata extraction, autonomous citation indexing, graph analysis, citation context extraction, and related document computation. This talk covers the design, implementa...
Article
Full-text available
We propose methods for unsupervised learning of text profiles for music from unstructured text obtained from the web. The profiles can be used for classifica-tion, recommendation, and understanding, and may be used in conjunction with existing methods such as au-dio analysis and collaborative filtering to improve per-formance. A formal method for a...
Article
We present two new algorithms for generating uniformly random samples of pages from the World Wide Web, building upon recent work by Henzinger et al. (Henzinger et al. 2000) and Bar-Yossef et al. (Bar-Yossef et al. 2000). Both algorithms are based on a weighted random-walk methodology. The first algorithm (DIRECTED-SAMPLE) operates on arbitrary dir...
Article
The Web's structure has been studied at a global level, considering the network as a whole, and at a local level, studying focused neighborhoods and "community" structures. This analysis has revealed an intricate structure that suggests improved methods for organizing and accessing information and offers the opportunity to chart interests and relat...
Article
This article discusses improvement that can enhance web searching with user preferences. When searching the web, a user can be overwhelmed by thousands of results retrieved by a search engine, few of which are valuable. The problem for search engines is not only to find relevant results, but results consistent with user's information need. It is a...
Article
Full-text available
Several studies show that the distribution of the number of links per web page follows a power law in the limit for large numbers of links.
Article
The Web is revolutionizing the entire scholarly communication process and changing the way that researchers exchange information. In this paper, we analyze two views of information production and use in computer-related research based on citation analysis of PDF and Postcript formatted publications on the Web using autonomous citation indexing (ACI...
Article
Full-text available
Recommender systems leverage product and community information to target products to consumers. Researchers have developed collaborative recommenders, content-based recommenders, and a few hybrid systems. We propose a unified probabilistic framework for merging collaborative and content-based recommendations. We extend Hofmann’s (1999) aspect model...
Article
Game sites on the World Wide Web draw people from around the world with specialized interests, skills, and knowledge. Data from the games often reflects the players' expertise and will to win. We extract probabilistic forecasts from data obtained from three online games: the Hollywood Stock Exchange (HSX), the Foresight Exchange (FX), and the Formu...
Article
Articles freely available online are more highly cited. For greater impact and faster scientific progress, authors and publishers should aim to make research easy to access.
Article
Full-text available
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the p...
Article
The World Wide Web has revolutionized the way that people access information, and has opened up new possibilities in areas such as digital libraries, general and scientific information dissemination and retrieval, education, commerce, entertainment, government, and health care. There are many avenues for improvement of the Web, for example in the a...
Article
Full-text available
We analyze the computer science literature on the web and compare it to the literature indexed in the Science Citation Index (SCI). The web contains articles from throughout the research timeline, from technical reports and conference papers to journal articles and book chapters, whereas SCI focuses on journal articles. Analyzing the citation patte...
Article
We analyze the efficiency and forecast accuracy of two market games on the World Wide Web: the Hollywood Stock Exchange (HSX) and the Foresight Exchange (FX). We quantify the degree of arbitrage available on HSX, and compare with a real-money market of a similar nature. We show that prices of HSX movie stocks provide good forecasts of actual box oc...
Article
Full-text available
We introduce a method for learning query transformations that improves the ability to retrieve answers to questions from an information retrieval system. During the training stage the method involves automatically learning phrase features for classifying questions into different types, automatically generating candidate query transformations from a...
Article
One of the most important aspects of any machine learning paradigm is how it scales according to problem size and complexity. Using a task with known optimal training error, and a pre-specified maximum number of training updates, we investigate the convergence of the backpropagation algorithm with respect to a) the complexity of the required functi...
Article
Faces represent complex, multidimensional, meaningful visual stimuli and developing a computational model for face recognition is difficult [42]. We present a hybrid neural network solution which compares favorably with other methods. The system combines local image sampling, a self-organizing map neural network, and a convolutional neural network....
Article
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the p...
Article
Full-text available
The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to region...
Article
We propose distributed error correction for digital libraries, where individual users can correct information in a database in real-time. Distributed error correction is used in the ResearchIndex (formerly CiteSeer) scientific literature digital library developed at NEC Research Institute. We discuss issues including motivation to contribute correc...
Article
Full-text available
The sequential minimal optimization algorithm (SMO) has been shown to be an effective method for training support vector machines (SVMs) on classification tasks defined on sparse data sets. SMO differs from most SVM algorithms in that it does not require a quadratic programming solver. In this work, we generalize SMO so that it can handle regressio...
Article
Advances in computational resources and the communications infrastructure, and the rapid rise of the World Wide Web, have led to the increasingly widespread availability of scientific papers in electronic form. Scientific papers usually contain citations to previous work, and indices of these citations are valuable for literature search, analysis,...
Article
Full-text available
The lack of persistence of Web references has called into question the increasingly common practice of citing URLs in scientific papers. It is argued that although few critical resources have been lost to date, new strategies to manage Internet resources and improved citation practices are necessary to minimize the future loss of information
Article
Full-text available
Assessing the probabilities of future events is a problem often faced by science policymakers. For example, CERN, the European laboratory for particle physics, recently had to judge whether the probability of discovering a Higgs boson was high enough to justify extending the operation of its collider (see Science, 22 Sept., p. 2014 and 29 Sept., p....
Conference Paper
Full-text available
A basic problem of information processing is selecting enough features to ensure that events are accurately represented for classification problems, while simultaneously minimizing storage and processing of irrelevant or marginally important features. To address this problem, feature selection procedures perform a search through the feature power s...
Conference Paper
Full-text available
Users looking for documents within specific categories may have a difficult time locating valuable documents using general purpose search engines. We present an automated method for learning query modifications that can dramatically improve precision for locating pages within specified categories using Web search engines. We also present a classifi...
Conference Paper
Full-text available
In this paper we demonstrate the artist detection component of Minnowmatch, a machine listening and music retrieval engine. Minnowmatch (Mima) automatically determines various meta-data and makes classifications concerning a piece of audio using neural networks and support vector machines. The technologies developed in Minnowmatch may be used to cr...