Shashank Pandit’s research while affiliated with Carnegie Mellon University and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (6)


Topic: Social Networks Parallel Crawling for Online Social Networks
  • Conference Paper

May 2007

·

108 Reads

·

134 Citations

·

Shashank Pandit

·

Samuel Wang

·

Christos Faloutsos

Given a huge online social network, how do we retrieve information from it through crawling? Even better, how do we improve the crawling performance by using parallel crawlers that work independently? In this paper, we present the framework of parallel crawlers for online social networks, utilizing a centralized queue. To show how this works in practice, we describe our implementation of the crawlers for an online auction website. The crawlers work independently, therefore the failing of one crawler does not affect the others at all. The framework ensures that no redundant crawling would occur. Using the crawlers that we built, we visited a total of approximately 11 million auction users, about 66,000 of which were completely crawled.


Netprobe: A fast and scalable system for fraud detection in online auction networks

May 2007

·

443 Reads

·

400 Citations

Given a large online network of online auction users and their histories of transactions, how can we spot anomalies and auction fraud? This paper describes the design and implementation of NetProbe, a system that we propose for solving this problem. NetProbe models auction users and transactions as a Markov Random Field tuned to detect the suspicious patterns that fraudsters create, and employs a Belief Propagation mechanism to detect likely fraudsters. Our experiments show that NetProbe is both efficient and effective for fraud detection. We report experiments on syn- thetic graphs with as many as 7,000 nodes and 30,000 edges, where NetProbe was able to spot fraudulent nodes with over 90% precision and recall, within a matter of seconds. We also report experiments on a real dataset crawled from eBay, with nearly 700,000 transactions between more than 66,000 users, where NetProbe was highly effective at unearthing hidden networks of fraudsters, within a realistic response time of about 6 minutes. For scenarios where the under- lying data is dynamic in nature, we propose Incremental NetProbe, which is an approximate, but fast, variant of Net- Probe. Our experiments prove that Incremental NetProbe executes nearly doubly fast as compared to NetProbe, while retaining over 99% of its accuracy.


Fig. 1. 2LFS in action: (a) given graph (b) after labeling by 2LFS: fraud (red triangles), honest (green circles), "accomplices" (yellow diamonds) (c) after manual rearrangement, to highlight the "bipartite cores". The nodes in the two black rectangles are confirmed fraudsters.
Fig. 3. The Propagation Matrix for an edge. Entry (i, j) gives the conditional probability that the destination node is at state j, when the source node is at state i.
Fig. 5. Pseudo code for network-LFS 
Fig. 6.
Fig. 7. Min Detection Size vs noise-lower is better: 2LFS is robust to minor deviations in graph structure, and even to wrong priors

+1

Detecting Fraudulent Personalities in Networks of Online Auctioneers
  • Conference Paper
  • Full-text available

January 2006

·

427 Reads

·

154 Citations

Lecture Notes in Computer Science

Online auctions have gained immense popularity by creat- ing an accessible environment for exchanging goods at reasonable prices. Not surprisingly, malevolent auction users try to abuse them by cheating others. In this paper we propose a novel method, 2-Level Fraud Spotting (2LFS), to model the techniques that fraudsters typically use to carry out fraudulent activities, and to detect fraudsters preemptively. Our key contributions are: (a) we mine user level features (e.g., number of trans- actions, average price of goods exchanged, etc.) to get an initial belief for spotting fraudsters, (b) we introduce network level features which capture the interactions between different users, and (c) we show how to combine both these features using a Belief Propagation algorithm over a Markov Random Field, and use it to detect suspicious patterns (e.g., unnaturally close-nit groups of people that trade mainly among themselves). Our al- gorithm scales linearly with the number of graph edges. Moreover, we illustrate the effectiveness of our algorithm on a real dataset collected from a large online auction site.

Download

Figure 4: Bidirectional Search Example  
Bidirectional Expansion For Keyword Search on Graph Databases.

January 2005

·

2,323 Reads

·

483 Citations

Varun Kacholia

·

Shashank Pandit

·

·

[...]

·

Relational, XML and HTML data can be represented as graphs with entities as nodes and relationships as edges. Text is associated with nodes and possibly edges. Keyword search on such graphs has received much attention lately. A central problem in this scenario is to efficiently extract from the data graph a small number of the "best" answer trees. A , which improves on Backward Expanding search by allowing forward search from potential roots towards leaves. To exploit this flexibility, we devise a novel search frontier prioritization technique based on spreading activation. We present a performance study on real data, establishing that Bidirectional Search significantly outperforms Backward Expanding search.


FleXPath: Flexible structure and full-text querying for XML

June 2004

·

138 Reads

·

204 Citations

Querying XML data is a well-explored topic with powerful database-style query languages such as XPath and XQuery set to become W3C standards. An equally compelling paradigm for querying XML documents is full-text search on textual content. In this paper, we study fundamental challenges that arise when we try to integrate these two querying paradigms.While keyword search is based on approximate matching, XPath has exact match semantics. We address this mismatch by considering queries on structure as a "template", and looking for answers that best match this template and the full-text search. To achieve this, we provide an elegant definition of relaxation on structure and define primitive operators to span the space of relaxations. Query answering is now based on ranking potential answers on structural and full-text search conditions. We set out certain desirable principles for ranking schemes and propose natural ranking schemes that adhere to these principles. We develop efficient algorithms for answering top-K queries and discuss results from a comprehensive set of experiments that demonstrate the utility and scalability of the proposed framework and algorithms.


Biometric authentication using random distributions (BioART)

January 2003

·

143 Reads

·

11 Citations

In this paper, we present a novel approach of authen-ticating users accessing the resources of any system using Random Distribution Functions. The proposed method in-volves collecting data that represents the biometric pattern of a user and converting this biometric data into a form that can be mathematically manipulated. This is followed by application of statistical methods to find an expression that defines the user's biometric traits, against which (s)he can then be authenticated in future. The algorithm aims to develop a system that has low False Acceptance and False Rejection Rates. We illustrate this using Keystroke Dynam-ics as one of the Biometric Authentication systems.

Citations (6)


... One of the advantages of keystroke dynamics is that it is inexpensive because it can be used without any additional hardware. In addition, the user acceptance of a keystroke dynamics biometric system is considered very high [16,10]. ...

Reference:

Advanced Authentication Scheme Using a Predefined Keystroke Structure
Biometric authentication using random distributions (BioART)
  • Citing Article
  • January 2003

... Keyword research is the first and most crucial step in any search engine optimization strategic plan (Yang et al., 2021;Garg, 2021). The most popular approach to solving the keyword search problem is Graph-Based Keyword Search (GBKS), which identifies a set of closely linked nodes in the graph that may match a specific keyword based on the query (Bhalotia et al., 2002;Kacholia et al., 2005;He et al., 2007), BANKS-I (Bhalotia et al., 2002) considers the shortest route from a tree's root to a node that contains keywords, BANKS-II (Kacholia et al., 2005) suggests using a forward search to approximate a solution, and BLINKS (He et al., 2007) tries to identify the set of all different sub trees with the best scores to improve the BANKS-II approach. These retrieval techniques are centered on nodes while using keyword search engines and semantic relationships (Wang et al., 2008) can link keyword inquiries and formal questions. ...

Bidirectional Expansion For Keyword Search on Graph Databases.

... We already have a lot and different efficient type of web crawlers that are already been described in the citations in the last section. All these crawlers guarantee or function to extract data store them, take queries execute the faster way to 6 search/crawl the web through different methods like focused crawlers incremental web crawlers etc. ...

Topic: Social Networks Parallel Crawling for Online Social Networks
  • Citing Conference Paper
  • May 2007

... This principle serves as a powerful predictive assumption, suggesting that linked nodes are likely to have similar latent representations [19]. However, such assumptions of homophily do not always hold true in many real world datasets [20,21]. Further, while new GNNs that work better in non-homophilous settings [22,23,24,25] are either small in size [26,27] or have been validate with datasets with small number of classes [28,29]. ...

Detecting Fraudulent Personalities in Networks of Online Auctioneers

Lecture Notes in Computer Science

... Future research should focus on developing methods or tools to elucidate GNNs' decision processes, aiming for greater transparency and understandability. Techniques could include advanced visualization methods to demonstrate how networks detect and react to fraudulent activities [149,150], or algorithms to clarify the significance of specific nodes and edges [151,152]. Enhancing the Scalability of GNNs. ...

Netprobe: A fast and scalable system for fraud detection in online auction networks
  • Citing Conference Paper
  • May 2007

... We propose a sophisticated framework of query relaxations for supporting approximate queries over XML data in this paper.Our approach adequately takes structures and the surmise of users' concerns into account, and it, therefore, has the ability to elegantly combine structures with contents to answer approximate queries. [4] The answers underlying our proposed framework are not compelled to strictly satisfy the given query formulation; instead, they can be founded on properties inferable from the original query. We, then, develop a novel top-k retrieval approach that can smartly generate the most promising answers in an order correlated with the ranking measures.In particular, rather than shifting the burden of © 2020 IJAICT (www.ijaict.com) ...

FleXPath: Flexible structure and full-text querying for XML