Eric J. Glover’s research while affiliated with Pennsylvania State University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (30)


Modeling Information Incorporation in Markets, with Application to Detecting and Explaining Events
  • Article

December 2012

·

31 Reads

·

8 Citations

David M Pennock

·

Sandip Debnath

·

Eric J. Glover

·

C. Lee Giles

We develop a model of how information flows into a market, and derive algorithms for automatically detecting and explaining relevant events. We analyze data from twenty-two "political stock markets" (i.e., betting markets on political outcomes) on the Iowa Electronic Market (IEM). We prove that, under certain efficiency assumptions, prices in such betting markets will on average approach the correct outcomes over time, and show that IEM data conforms closely to the theory. We present a simple model of a betting market where information is revealed over time, and show a qualitative correspondence between the model and real market data. We also present an algorithm for automatically detecting significant events and generating semantic explanations of their origin. The algorithm operates by discovering significant changes in vocabulary on online news sources (using expected entropy loss) that align with major price spikes in related betting markets.


Improving Category Specific Web Search

January 2003

·

16 Reads

·

11 Citations

Eric J. Glover

·

·

·

[...]

·

David Pennock

A user searching for documents' within a specific category using a general purpose search engine might have a difficult time finding valuable documents '. To improve category specific search, we show that a trained classifier can recognize pages of a specified category with high precision by using tex- tual content, text location, and HTML structure. We show that query modifications to web search engines increase the probability that the documents' returned are of the specific category.


Figure 1: A figure showing the predicted relationships between parent, child and self features. Positive frequency is the percentage of documents in the positive set that contain a given feature. Collection frequency is the overall percentage of documents that contain a given feature.
Figure 2: Sample distribution of features for the area of biology, with parent science, and child botany.
Figure 3: Distribution of ground truth features from the Open Directory.
Figure 4: Distribution of ground truth features from the Open Directory, removing the insufficiently defined children, and changing the parent of "computers" to "computer".
Figure 5: Extended anchortext refers to the words in close proximity to an inbound link.

+1

Inferring Hierarchical Descriptions
  • Article
  • Full-text available

October 2002

·

137 Reads

·

65 Citations

We create a statistical model for inferring hierarchical term relationships about a topic, given only a small set of example web pages on the topic, without prior knowledge of any hierarchical information. The model can utilize either the full text of the pages in the cluster or the context of links to the pages. To support the model, we use "ground truth" data taken from the category labels in the Open Directory. We show that the model accurately separates terms in the following classes: self terms describing the cluster, parent terms describing more general concepts, and child terms describing specializations of the cluster. For example, for a set of biology pages, sample parent, self, and child terms are science, biology, and genetics respectively. We create an algorithm to predict parent, self, and child terms using the new model, and compare the predictions to the ground truth data. The algorithm accurately ranks a majority of the ground truth terms highly, and identifies additional complementary terms missing in the Open Directory.

Download

Fig. 1 Diamonds plot the empirically observed connectivity distribution for company homepages. Circles display the histogram resulting from a simulation of the model, with parameters t 4,923, m 1,356, and 0.950 set to match the company data. The dashed line marks the analytic solution (Eq. 3) instantiated with the same parameters.  
Fig. 2 Diamonds display log–log histograms of inbound connectivities for category-specific homepages, and inbound and outbound connectivities for random web pages. Circles mark the connectivity distributions, with m 0 0, t set equal to the number of web pages, 2m set equal to the average number of inbound links per page, and chosen according to a nonlinear least-squares fit. Dashed lines indicate the analytic solutions (Eq. 3).  
Winners don't take all: Characterizing the competition for links on the web

May 2002

·

191 Reads

·

450 Citations

Proceedings of the National Academy of Sciences

As a whole, the World Wide Web displays a striking "rich get richer" behavior, with a relatively small number of sites receiving a disproportionately large share of hyperlink references and traffic. However, hidden in this skewed global distribution, we discover a qualitatively different and considerably less biased link distribution among subcategories of pages-for example, among all university homepages or all newspaper homepages. Although the connectivity distribution over the entire web is close to a pure power law, we find that the distribution within specific categories is typically unimodal on a log scale, with the location of the mode, and thus the extent of the rich get richer phenomenon, varying across different categories. Similar distributions occur in many other naturally occurring networks, including research paper citations, movie actor collaborations, and United States power grid connections. A simple generative model, incorporating a mixture of preferential and uniform attachment, quantifies the degree to which the rich nodes grow richer, and how new (and poorly connected) nodes can compete. The model accurately accounts for the true connectivity distributions of category-specific web pages, the web as a whole, and other social networks.


Extracting Query Modifications from Nonlinear SVMs

April 2002

·

36 Reads

·

29 Citations

When searching the WWW, users often desire results restricted to a particular document category. Ideally, a user would be able to filter results with a text classifier to minimize false positive results; however, current search engines allow only simple query modifications. To automate the process of generating effective query modifications, we introduce a sensitivity analysis-based method for extracting rules from nonlinear support vector machines. The proposed method allows the user to specify a desired precision while attempting to maximize the recall. Our method performs several levels of dimensionality reduction and is vastly faster than searching the combination feature space; moreover, it is very effective on real-world data.


Using Web Structure for Classifying and Describing Web Pages

March 2002

·

397 Reads

·

237 Citations

The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document text, and the text in citing documents near the citation, for classification and description. Results show that the text in citing documents, when available, often has greater discriminative and descriptive power than the text in the target document itself. The combination of evidence from a document and citing documents can improve on either information source alone. Moreover, by ranking words and phrases in the citing documents according to expected entropy loss, we are able to accurately name clusters of web pages, even with very few positive examples. Our results confirm, quantify, and extend previous research using web structure in these areas, introducing new methods for classification and description of pages.






Citations (26)


... Some works try to use models to learn candidate selection. Flake et al. [16] proposes an approach for constructing query modifications in the web search domain using corpus-based SVM models. Borisyuk et al. [18] makes use of WAND query and proposes a machine learned candidate selection framework in LinkedIn's Galene search platform. ...

Reference:

Beyond Keywords and Relevance: A Personalized Ad Retrieval Framework in E-Commerce Sponsored Search
Extracting query modifications from nonlinear SVMs
  • Citing Conference Paper
  • January 2002

... For instance, a category can be specified by the user and then used by a meta-search engine to select a domain-specific search engine to send queries, to modify queries and to define a ranking on search results. An early system that has adopted this strategy is Inquirus 2 and its successors [46,47,48]. ...

Feature Selection in Web Applications By ROC Inflections and Powerset Pruning
  • Citing Conference Paper
  • January 2001

... Sports Winkler (1971), Lawrence et al. (2002), Debnath et al. (2003), Chen et al. (2005), Grant and Johnstone (2010), Štrumbelj and Šikonja (2010), Constantinou and Fenton (2012), Carvalho and Larson (2013), Deloatch et al. (2013), Carvalho et al. (2015Carvalho et al. ( , 2016a ...

Characterizing efficiency and information incorporation in sports betting markets
  • Citing Article
  • January 2002

... Health/Medicine Dolan et al. (1986), Spiegelhalter (1986), Linnet (1988Linnet ( , 1989, Spiegelhalter et al. (1990), Bernardo and Muñoz (1993), Winkler and Poses (1993), Madigan et al. (1995), Dawid and Musio (2013), Conigliani et al. (2015) Natural language processing Brümmer and du Preez (2006), Brummer and Van Leeuwen (2006), Campbell et al. (2006) Politics Heath and Tversky (1991), Pennock et al. (2002), Tetlock (2005) Psychology Phillips and Edwards (1966), Schum et al. (1967), von Holstein (1971b, Jensen and Peterson (1973), Fischer (1982), Tetlock and Kim (1987), Nelson and Bessler (1989), van Lenthe ( (2015) Risk analysis Cunningham and Martell (1976), Garthwaite and O'Hagan (2000), Walker et al. (2003) Brier (1950), Sanders (1963), Winkler and Murphy (1969), Glahn and Jorgensen (1970), von Holstein (1971a), Murphy (1974), Charba and Klein (1980), Winkler (1982, 1984), Murphy and Daan (1984), Murphy (1985), Brunet et al. (1988), Epstein (1988), Murphy et al. (1989), Murphy (1990), Murphy and Winkler (1992), Murphy (1993), Winkler (1994), Katz and Murphy (1997), Wilson et al. (1999), Roulston and Smith (2002), Mason (2004), Grimit et al. (2006), Friederichs and Hense (2007), , Ahrens and Walser (2008), Bröcker and Smith (2008), Gneiting et al. (2008), Jaun and Ahrens (2009) (2013), Thorarinsdottir et al. (2013), Christensen (2015), Christensen et al. (2015), Smith et al. (2015) ...

Modeling Information Incorporation in Markets, with Application to Detecting and Explaining Events
  • Citing Article
  • December 2012

... Por otro lado (Lawrence, 2000) propone 5 tipos de estrategia para incluir el contexto en las búsquedas: Remembrance Agent (Rhodes, 2000), SurfLen (Fu, Budzik, & Hammond, 2000), Margin Notes (Rhodes, 2000), y Fab (Balabanovic & Shoham, 1997 CiteSeer (Lawrence, Giles, & Bollacker, 1999) y Deadliner (Kruger, et al., 2000). También existen meta buscadores que implementan algún mecanismo para derivar el contexto del usuario y luego usan esa información para realizar consultas sobre alguno se estos buscadores especializados. ...

Reference:

Thesis
DEADLINER: Building a New Niche Search Engine.

... For instance, a category can be specified by the user and then used by a meta-search engine to select a domain-specific search engine to send queries, to modify queries and to define a ranking on search results. An early system that has adopted this strategy is Inquirus 2 and its successors [46,47,48]. ...

Improving Category Specific Web Search by Learning Query Modifications.