Pranam Kolari’s research while affiliated with Yahoo and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (36)


Ranking of search results based on microblog data
  • Patent
  • Full-text available

June 2014

·

17 Reads

Anlei Dong

·

Pranam Kolari

·

Ruiqiang Zhang

·

[...]

·

Zhaohui Zheng

An information retrieval system is described herein that monitors a microblog data stream that includes microblog posts to discover and index fresh resources for searching by a search engine. The information retrieval system also uses data from the microblog data stream as well as data obtained from a microblog subscription system to compute novel and effective features for ranking fresh resources which would otherwise have impoverished representations. An embodiment of the present invention advantageously enables a search engine to produce a fresher set of resources and to rank such resources for both relevancy and freshness in a more accurate manner.

Download

Related news articles

April 2014

·

17 Reads

Methods, systems, and computer programs are presented for providing internet content, such as related news articles. One method includes an operation for defining a plurality of candidates based on a seed. For each candidate, scores are calculated for relevance, novelty, connection clarity, and transition smoothness. The score for connection clarity is based on a relevance score of the intersection between the words in the seed and the words in each of the candidates. Further, the score for transition smoothness measures the interest in reading each candidate when transitioning from the seed to the candidate. For each candidate, a relatedness score is calculated based on the calculated scores for relevance, novelty, connection clarity, and transition smoothness. In addition, at least one of the candidates is selected based on their relatedness scores for presentation to the user.


Improving Recency Ranking Using Twitter Data

February 2013

·

64 Reads

·

26 Citations

ACM Transactions on Intelligent Systems and Technology

In Web search and vertical search, recency ranking refers to retrieving and ranking documents by both relevance and freshness. As impoverished in-links and click information is the the biggest challenge for recency ranking, we advocate the use of Twitter data to address the challenge in this article. We propose a method to utilize Twitter TinyURL to detect fresh and high-quality documents, and leverage Twitter data to generate novel and effective features for ranking. The empirical experiments demonstrate that the proposed approach effectively improves a commercial search engine for both Web search ranking and tweet vertical ranking.


Figure 1: Venn diagram of content overlap between two documents 
Figure 2: The distribution of relatedness judgments given by one editor, when another one’s judgment is “unrelated” (top left), “somewhat related” (top right), “very related” (bottom left), and “redundant” (bottom right), respectively. 
Figure 3: Comparison of document (doc) and passage (psg) retrieval. In a label "X.Y", 'X' stands for indexing methods for seed documents.
Figure 4: Performance comparison of machinelearned recommenders using different feature sets.
Figure 5: Sensitivity to the size of training data.
Learning to model relatedness for news recommendation

March 2011

·

247 Reads

·

94 Citations

With the explosive growth of online news readership, recommending interesting news articles to users has become extremely important. While existing Web services such as Yahoo! and Digg attract users' initial clicks by leveraging various kinds of signals, how to engage such users algorithmically after their initial visit is largely under-explored. In this paper, we study the problem of post-click news recommendation. Given that a user has perused a current news article, our idea is to automatically identify "related" news articles which the user would like to read afterwards. Specifically, we propose to characterize relatedness between news articles across four aspects: relevance, novelty, connection clarity, and transition smoothness. Motivated by this understanding, we define a set of features to capture each of these aspects and put forward a learning approach to model relatedness. In order to quantitatively evaluate our proposed measures and learn a unified relatedness function, we construct a large test collection based on a four-month commercial news corpus with editorial judgments. The experimental results show that the proposed heuristics can indeed capture relatedness, and that the learned unified relatedness function works quite effectively.


Temporal query log profiling to improve web search ranking

October 2010

·

82 Reads

·

5 Citations

Temporal information can be leveraged and incorporated to improve web search ranking. In this work, we propose a method to improve the ranking of search results by identifying the fundamental properties of temporal behavior of low-quality hosts and spam-prone queries in search logs and modeling those properties as quantifiable features. In particular, we introduce the concepts of host churn, a measure of changes in host visibility for user queries, and query volatility, a measure of semantic instability of query results, and propose the methods for construction of temporal profiles from search query logs that can be used for estimation of a set of features based on the introduced concepts. The utility of the proposed concepts has been experimentally demonstrated for two language-independent search tasks: the regression-based ranking of search results and a novel classification problem of detecting spam-prone queries introduced in this work.


Figure 1: A typical tweet message accompanying a tiny URL. This specific tiny URL is tweeted by five unique users with their respective tweet messages. Such tweet messages are indicative of the content of the tiny URL. The users' photos and names are mosaicked to protect privacy.
Table 8: Twitter feature importance list. The Twitter feature definitions can be found in Table 1. 
Time is of the essence: Improving recency ranking using Twitter data

April 2010

·

2,529 Reads

·

182 Citations

Realtime web search refers to the retrieval of very fresh content which is in high demand. An effective portal web search engine must support a variety of search needs, including realtime web search. However, supporting realtime web search introduces two challenges not encountered in non-realtime web search: quickly crawling relevant content and ranking documents with impoverished link and click information. In this paper, we advocate the use of realtime micro-blogging data for addressing both of these problems. We propose a method to use the micro-blogging data stream to detect fresh URLs. We also use micro-blogging data to compute novel and effective features for ranking fresh URLs. We demonstrate these methods improve effective of the portal web search engine for realtime web search.


Learning Recurrent Event Queries for Web Search.

January 2010

·

30 Reads

·

27 Citations

Recurrent event queries (REQ) constitute a special class of search queries occurring at regular, predictable time intervals. The freshness of documents ranked for such queries is generally of critical importance. REQ forms a significant volume, as much as 6% of query traffic received by search engines. In this work, we develop an improved REQ classifier that could provide significant improvements in addressing this problem. We analyze REQ queries, and develop novel features from multiple sources, and evaluate them using machine learning techniques. From historical query logs, we develop features utilizing query frequency, click information, and user intent dynamics within a search session. We also develop temporal features by time series analysis from query frequency. Other generated features include word matching with recurrent event seed words and time sensitivity of search result set. We use Naive Bayes, SVM and decision tree based logistic regression model to train REQ classifier. The results on test data show that our models outperformed baseline approach significantly. Experiments on a commercial Web search engine also show significant gains in overall relevance, and thus overall user experience.


Table 1 : Experiment 1a: Retraining to maintain mutual agreement using ensemble labels
Table 3 : Experiment 2a: Impact on ensemble accuracy, retraining using ensemble labels
Ensembles in adversarial classification for spam

November 2009

·

109 Reads

·

32 Citations

The standard method for combating spam, either in email or on the web, is to train a classifier on manually labeled instances. As the spammers change their tactics, the perfor- mance of such classifiers tends to decrease over time. Gath- ering and labeling more data to periodically retrain the clas- sifier is expensive. We present a method based on an ensem- ble of classifiers that can detect when its performance might be degrading and retrain itself, all without manual interven- tion. Experiments with a real-world dataset from the blog domain show that our methods can significantly reduce the number of times classifiers are retrained when compared to a fixed retraining schedule, and they maintain classification accuracy even in the absence of manually labeled examples.


Figure 2. The performance of local models, as measured by the standard, area under the curve metric, varies for different feature types and sizes.
Figure 3. Our experiments show that using polar links for classification yields better results than plain link structure. 
Web (2.0) Mining: Analyzing Social Media

December 2008

·

454 Reads

·

12 Citations

Social media systems such as blogs, photo and link sharing sites, wikis and on-line forums are estimated to produce up to one third of new Web content. One thing that sets these "Web 2.0" sites apart from tradi- tional Web pages and resources is that they are inter- twined with other forms of networked data. Their stan- dard hyperlinks are enriched by social networks, com- ments, trackbacks, advertisements, tags, RDF data and metadata. We describe recent work on building systems that analyse these emerging social media systems to rec- ognize spam blogs, find opinions on topics, identify com- munities of interest, derive trust relationships, and de- tect influential bloggers.


Figure 1: Modeling influence and information flow on the Blogosphere and other social media systems requires attention to many factors, including link structure, sentiment analysis, readership data, conversational structure, topic classification, and temporal analysis (figure from (Adamic & Glance 2005)). 
Figure 3: The tag cloud generated from the top 200 folders before and after merging related folders. The size of the word is scaled to indicate how many users use the folder name. 
Figure 7: These four graphs represent our atomic propagation patterns. The solid and dotted arrows represent known and inferred trust scores, respectively. The first, for example, indicates that if A and C trust B, then A and C are likely to trust each other. 
Figure 8: Our experiments show that using polar links for classification yields better results than plain link structure. 
Figure 9: The graph representation for the Blogosphere includes both a blog network and post network. 
The Information Ecology of Social Media and Online Communities

September 2008

·

525 Reads

·

57 Citations

AI Magazine

Social media systems such as weblogs, photo- and link-sharing sites, wikis, and online forums are currently thought to produce up to one third of new web content One thing that sets these "web 2.0" sites apart from traditional web pages and resources is that they are intertwined with other forms of networked data. Their standard hyperlinks are enriched by social networks, comments, trackbacks, advertisements, tags, RDF data, and metadata. We describe recent work on building systems that use models of the blogosphere to recognize spam blogs, find opinions on topics, identify communities of interest, derive trust relationships, and detect influential bloggers.


Citations (32)


... Tanaka et al. [18] predicted word trends on Twitter. Chang et al. [19] claimed that Twitter data can be used to improve both web and tweet rankings. Bhattacharya et al. [20] used a social annotation-based methodology to first infer the topics of popular Twitter users, and then transitively infer the interests of the users who follow them. ...

Reference:

Buzz Tweet Classification Based on Text and Image Features of Tweets Using Multi-Task Learning
Improving Recency Ranking Using Twitter Data
  • Citing Article
  • February 2013

ACM Transactions on Intelligent Systems and Technology

... The rise of blogs among online social media has been discussed from a sociological point of view in numerous papers, e.g., [16] give some insights about bloggers demographics and cultural behaviors; in [10], the author describes their influence on society. Given the size and richness of the blog datasets, automatic classification and text-mining tools have been widely used to study the dynamics of trends and opinions in the blogosphere [12,2,15,13,21]. For example, some studies concentrate on the political blogosphere to understand the ties between political parties, in particular the way information spreads from a group to another [1,9]. ...

Web (2.0) Mining: Analyzing Social Media

... To assess the effectiveness of our algorithm, we tested it on two datasets. The first dataset is constructed in the same way as described in (Kale et al. 2007), where we ended up with a graph of 404 connected blogs. We will refer to this as the Kale dataset. ...

Modeling Trust and Influence in the Blogosphere Using Link Polarity1

... Kolari et al. in their work looked at blogs related to certain enterprise as a source of evidence of that expertise for potential employees. The authors investigated whether depending on blogs has similar effect as depending on emails but with less privacy concerns and they even have the added value of allowing implicit voting via comments by the community [53]. Balog et al. have presented strategies for finding experts relying on document repositories in the enterprise [54]. ...

Expert search using internal corporate blogs

... The person's trust in a transaction is determined by the trust in the counter party and the trust in the transaction media based on the assumption that party and media trust supplement each other. If there is not sufficient party trust, then the media trust and its control protocols should be brought in to supplement the party trust [38]. Trust in the counter party can be defined as "The subjective probability by which an individual A expects that another individual B performs a given action on which its welfare depends" [45, p. 56]. ...

Modeling and evaluating trust network inference

... The recent popularity of OSN provides a rich and natural tool to support the development of human communities; this is already becoming a fertile field of academic research (Beer, 2008;Blanchard, 2004;Boyd & Ellison, 2007;Fogel & Nehmad, 2009;Gross & Acquisti, 2005;Java, Kolari, Finin, Joshi & Oates, 2007;Stutzman, 2006 (Wellman, 2001, p. 228). ...

Feeds that matter: A study of bloglines subscriptions

... The first and most pressing is that it typically relies on large training data, with 500 labels per category as a standard recommendation, and 100 as a minimum (Hopkins et. al., 2007). Traditional models also run the risk of misclassification on phrases that would be intuitive for humans due to their lack of semantic knowledge (Grimmer & Stewart, 2013). ...

Extracting systematic social science meaning from text

... However, content visibility can moderate this relation for bridging structures. In general, being positioned in bridging structures in the interaction layer can lead to increased participation in idea generation (van Osch and Bulgurcu 2020) and structural autonomy (Kolari et al. 2007;Berger et al. 2014b;Recker and Lekse 2015). This autonomy can be applied to leverage one's interaction ties to gain better access to non-redundant knowledge via associated flow ties (Jackson et al. 2007). ...

On the structure, properties and utility of internal corporate blogs