Geographic ranking for a local search engine.
ABSTRACT Traditional ranking schemes of the relevance of a Web page to a user query in a search engine are less appropriate when the search term contains geographic information. Often, geographic entities, such as addresses, city names, and location names, appear only once or twice in a Web page, and are typically not in a heading or larger font. Consequently, an alternative ranking approach to the traditional weighted tf*idf relevance ranking is need. Further, if a Web site contains a geographic entity, it is often the case that its in- and out-neighbours do not refer to the same entity, although they may refer to other geographic entities. We present a local search engine that applies a novel ranking algorithm suitable for ranking Web pages with geographic content. We describe its major components: geographic ranking, focused crawling, geographic extractor, and the related web-sites feature.
Conference Paper: A Crawler for Local Search[Show abstract] [Hide abstract]
ABSTRACT: Vertical search engines enable users to find information related to a certain topic. A local search engine is a vertical search engine whose topic revolves around a certain geographical area (such as a city, state, country, etc...) In this paper we describe our experiences developing a crawler for a local search engine for the city of Bellingham, Washington, USA. We focus on the tasks of crawling and indexing a large amount of highly relevant Web pages, and then demonstrate ways in which our search engine has the capability to outperform an industrial search engine.The Fourth International Conference on Digital Society, ICDS 2010, 10.16 February 2010, St. Maarten, Netherlands Antilles; 01/2010
Conference Paper: Improving local search ranking through external logs.[Show abstract] [Hide abstract]
ABSTRACT: The signals used for ranking in local search are very different from web search: in addition to (textual) relevance, measures of (geographic) distance between the user and the search result, as well as measures of popularity of the result are important for effective ranking. Depending on the query and search result, different ways to quantify these factors exist -- for example, it is possible to use customer ratings to quantify the popularity of restaurants, whereas different measures are more appropriate for other types of businesses. Hence, our approach is to capture the different notions of distance/popularity relevant via a number of external data sources (e.g., logs of customer ratings, driving-direction requests, or site accesses). In this paper we will describe the relevant signal contained in a number of such data sources in detail and present methods to integrate these external data sources into the feature generation for local search ranking. In particular, we propose novel backoff methods to alleviate the impact of skew, noise or incomplete data in these logs in a systematic manner. We evaluate our techniques on both human-judged relevance data as well as click-through data from a commercial local search engine.Proceeding of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, Beijing, China, July 25-29, 2011; 01/2011