Aniket Chakrabarti’s research while affiliated with Microsoft and other places

What is this page?


This page lists works of an author who doesn't have a ResearchGate profile or hasn't added the works to their profile yet. It is automatically generated from public (personal) data to further our legitimate goal of comprehensive and accurate scientific recordkeeping. If you are this author and want this page removed, please let us know.

Publications (18)


Spread Sampling and Its Applications on Graphs
  • Chapter

November 2019

·

51 Reads

·

1 Citation

Studies in Computational Intelligence

Yu Wang

·

Bortik Bandyopadhyay

·

·

[...]

·

Srinivasan Parthasarathy

Efficiently finding small samples with high diversity from large graphs has many practical applications such as community detection and online survey. This paper proposes a novel scalable node sampling algorithm for large graphs that can achieve better spread or diversity across communities intrinsic to the graph without requiring any costly pre-processing steps. The proposed method leverages a simple iterative sampling technique controlled by two parameters: infection rate, that controls the dynamics of the procedure and removal threshold that affects the end-of-procedure sampling size. We demonstrate that our method achieves very high community diversity with an extremely low sampling budget on both synthetic and real-world graphs, with either balanced or imbalanced communities. Additionally, we leverage the proposed technique for a very low sampling budget (only 2%) driven treatment assignment in Network A/B Testing scenario, and demonstrate competitive performance concerning baseline on both synthetic and real-world graphs.


Precision@1 of semantic matching based models (BoW,LDA, Doc2Vec) on different Stack Exchange sites. The best Precision@1 of semantic matching is less than 30%
Precision@k (k=1,2,3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k=1,2,3$$\end{document}) of UpVotes-Rank on different Stack Exchange sites. About 70% best answers are in top 1 ranked by number of voting score, about 85% best answers are in top 2, and about 95% are in top 3
ColdRoute Architecture: Users’ past activities are used to train ColdRoute. Given a newly posted question either it is asked by a new asker or an existing asker, ColdRoute can predict the voting score for each answerer in the candidate set, and then select the user who has the highest voting score as the best answerer to route this cold-start question
Performance of ColdRoute-T, different kinds of regressors (with using the same feature set as ColdRoute-T), CQARank and LDA for cold questions asked by existing askers on 8 different Stack Exchange sites. a MRR. b Accuracy. c Precision@1. d Precision@3
Performance of ColdRoute-T, different kinds of regressors (with using the same feature set as ColdRoute-T), CQARank and LDA for cold questions asked by new askers on 8 Stack Exchange sites. a MRR. b Accuracy. c Precision@1. d Precision@3
ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites
  • Article
  • Full-text available

September 2018

·

143 Reads

·

17 Citations

Data Mining and Knowledge Discovery

Routing questions in Community Question Answer services (CQAs) such as Stack Exchange sites is a well-studied problem. Yet, cold-start -- a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. ColdRoute is able to handle the task of routing cold questions posted by new or existing askers to matching experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach to well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (Precision@1, Accuracy, MRR) over the state-of-the-art models such as semantic matching by 159.5%159.5\%,31.84%31.84\%, and 40.36%40.36\% for cold questions posted by existing askers, and 123.1%123.1\%, 27.03%27.03\%, and 34.81%34.81\% for cold questions posted by new askers respectively.

Download

ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites

July 2018

Routing questions in Community Question Answer services (CQAs) such as Stack Exchange sites is a well-studied problem. Yet, cold-start -- a phenomena observed when a new question is posted is not well addressed by existing approaches. Additionally, cold questions posted by new askers present significant challenges to state-of-the-art approaches. We propose ColdRoute to address these challenges. ColdRoute is able to handle the task of routing cold questions posted by new or existing askers to matching experts. Specifically, we use Factorization Machines on the one-hot encoding of critical features such as question tags and compare our approach to well-studied techniques such as CQARank and semantic matching (LDA, BoW, and Doc2Vec). Using data from eight stack exchange sites, we are able to improve upon the routing metrics (Precision@1, Accuracy, MRR) over the state-of-the-art models such as semantic matching by 159.5%159.5\%,31.84%31.84\%, and 40.36%40.36\% for cold questions posted by existing askers, and 123.1%123.1\%, 27.03%27.03\%, and 34.81%34.81\% for cold questions posted by new askers respectively.



Fast Change Point Detection on Dynamic Social Networks

August 2017

·

73 Reads

·

60 Citations

A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model - where a network is defined at logical time-stamps. An important problem under this model is change point detection. In this work we devise an effective and efficient three-step-approach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the state-of-the-art while improving quality on both synthetic and real world networks.


Hierarchical Change Point Detection on Dynamic Networks

June 2017

·

37 Reads

·

11 Citations

This paper studies change point detection on networks with community structures. It proposes a framework that can detect both local and global changes in networks efficiently. Importantly, it can clearly distinguish the two types of changes. The framework design is generic and as such several state-of-the-art change point detection algorithms can fit in this design. Experiments on both synthetic and real-world networks show that this framework can accurately detect changes while achieving up to 800X speedup.


Hierarchical Change Point Detection on Dynamic Networks

June 2017

This paper studies change point detection on networks with community structures. It proposes a framework that can detect both local and global changes in networks efficiently. Importantly, it can clearly distinguish the two types of changes. The framework design is generic and as such several state-of-the-art change point detection algorithms can fit in this design. Experiments on both synthetic and real-world networks show that this framework can accurately detect changes while achieving up to 800X speedup.


Table 2 : Notation Table
Figure 3: Comparison between MLE (red, top) and Simplified
Figure 4: SBM, ground truth changes explained in Table 2. 1 − α = 0.51 and window size = 20. DeltaCon (Fig b-top) and EM-KL (Fig b-bottom) have the smallest variance, but DeltaCon has two false negatives at 4 and 5 . 
Figure 5: Change point detection on US Senate co-sponsorship network. Change points at the 100th and the 104th Congresses (boxed) correspond to partisan domination shifts. Both EM-KL (green) and LetoChange (cyan) have perfect recall and precision, while DeltaCon (pink) has 3 false positives and 1 false negative. 
Fast Change Point Detection on Dynamic Social Networks

May 2017

·

206 Reads

·

21 Citations

A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model - where a network is defined at logical time-stamps. An important problem under this model is change point detection. In this work we devise an effective and efficient three-step-approach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the state-of-the-art while improving quality on both synthetic and real world networks.


D-STHARk: Evaluating Dynamic Scheduling of Tasks in Hybrid Simulated Architectures

December 2016

·

21 Reads

·

1 Citation

Procedia Computer Science

The emergence of applications that demand to handle efficiently growing amounts of data has stimulated the development of new computing architectures with several Processing Units (PUs), such as CPUs core, graphics processing units (GPUs) and Intel Xeon Phi (MIC). Aiming to better exploit these architectures, recent works focus on proposing novel runtime environments that offer a variety of methods for scheduling tasks dynamically on different PUs. A main limitation of such proposals refers to the constrained system configurations, usually adopted to tune and test the proposals, since setting more complete and diversified evaluation environments is costly. In this context, we present D-STHARk, a GUI tool for evaluating Dynamic Scheduling of Tasks in Hybrid Simulated ARchitectures. D-STHARk provides a complete simulated execution environment that allows evaluating dynamic scheduling strategies on simulated applications and hybrid architectures. We evaluate our tool by simulating the dynamic scheduling strategies presented in∼\cite{sbac2014}, using the same architecture and application. {\it D-STHARk} was able to achieve the same conclusions originally reported by the authors. Moreover, we performed an experiment varying the number of coprocessors, which was not previously verified due to lack of real architectures, showing that we may reduce the energy consumption, while keeping the same performance.


Robust Anomaly Detection for Large-Scale Sensor Data

November 2016

·

40 Reads

·

5 Citations

Large scale sensor networks are ubiquitous nowadays. An important objective of deploying sensors is to detect anomalies in the monitored system or infrastructure, which allows remedial measures to be taken to prevent failures, inefficiencies, and security breaches. Most existing sensor anomaly detection methods are local, i.e., they do not capture the global dependency structure of the sensors, nor do they perform well in the presence of missing or erroneous data. In this paper, we propose an anomaly detection technique for large scale sensor data that leverages relationships between sensors to improve robustness even when data is missing or erroneous. We develop a probabilistic graphical model-based global outlier detection technique that represents a sensor network as a pairwise Markov Random Field and uses graphical model inference to detect anomalies. We show our model is more robust than local models, and detects anomalies with 90% accuracy even when 50% of sensors are erroneous. We also build a synthetic graphical model generator that preserves statistical properties of a real data set to test our outlier detection technique at scale.


Citations (12)


... As extension of current study, we plan to apply our model to other applications such as community detection in dynamic networks ( Wang et al. 2018) and exception-tolerant abduction (Zhang, Mathew, and Juba 2017) in attributed networks ( Liang et al. 2018). We also would like to address the problem of routing newly posted questions (item coldstart) to newly registered users (user cold-start) in CQAs, with hoping to increase the expertise of the entire community. ...

Reference:

ATP: Directed Graph Embedding with Asymmetric Transitivity Preservation
Spread Sampling and Its Applications on Graphs
  • Citing Chapter
  • November 2019

Studies in Computational Intelligence

... Expert recommendation based on link analysis [8][9][10][11][12][13][14] constructs a question-answer relationship directed graph based on the historical interaction behavior between users in the community and then performs link analysis on the directed graph and calculates the authority of each user. Neural-network-based expert recommendation [15][16][17][18][19][20] encodes higher-level questions and feature representations of expert texts with the help of word2vec and graphs and then extracts features by convolutional neural networks and recurrent neural networks. ...

ColdRoute: Effective Routing of Cold Questions in Stack Exchange Sites

Data Mining and Knowledge Discovery

... Chakraborti et al. 9 consider the effect of heterogeneous workload distribution on bi-objective optimization of data analytics applications by simulating heterogeneity on homogeneous clusters. The performance is represented by a linear function of problem size and the total energy is predicted using historical data tables. ...

A Pareto Framework for Data Analytics on Heterogeneous Systems: Implications for Green Energy Usage and Performance
  • Citing Conference Paper
  • August 2017

... Many current CPD algorithms provide a way to measure dissimilarity between past and future time intervals [13]. When this dissimilarity reaches a certain threshold, an alarm is raised to signal a change point. ...

Fast Change Point Detection on Dynamic Social Networks
  • Citing Conference Paper
  • August 2017

... The results illustrate that when experiencing clustering events, there is a transition in the time scale (from slow to fast) and direction (from hierarchical to distributed) of information transfer in the network. Wang et al. [48] expressed the evolution of the temporal network as a Markov network and detected change points through estimating and comparing the joint edge (dyad) distribution. Experiments on the Senate cosponsorship network show that the method is more efficient than the other approaches in the same period while ensuring a good detection effect. ...

Fast Change Point Detection on Dynamic Social Networks

... In literature [11], a compressed binary tree corresponds to the streaming graph data for lossy summarization. In literature [12], hash functions maintain a minimum neighborhood sample subgraph in real-time. GSS [13] first generates a sketch of the streaming graph using hash functions, then uses a novel data structure to store it, achieving lossy summarization supporting various queries. ...

Topological Graph Sketching for Incremental and Scalable Analytics
  • Citing Conference Paper
  • October 2016

... We note that hypothesis testing has also been used for deciding a certain number of hashes for LSH in the context of similarity search [10,52]. The differences between our technique and [10,52] include: (1) ours is based on a random process of sampling dimensions of a transformed vector while [10,52] are on one of sampling hash functions, which entail significantly different hypothesis testings and (2) ours targets the Euclidean distance function while [10,52] target similarity functions such as Jaccard and Cosine similarity measures (it remains non-trivial to adapt the latter to the Euclidean space), and (3) ours guarantees to be no worse than the method of evaluating exact distances (in our case, i.e., FDScanning) because it obtains exact distances when it has sampled all the dimensions while [10,52] have no such guarantee (when they have sampled all the hash functions and still cannot produce a firmed result, they would have to re-evaluate exact similarities from scratch). ...

Sequential Hypothesis Tests for Adaptive Locality Sensitive Hashing
  • Citing Conference Paper
  • May 2015

... Les capteurs sont souvent utilisés pour suivre divers paramètres d'environnement et de localisation dans de nombreuses applications du monde réel. Les anomalies dans les données de capteurs font référence à des défauts de capteurs ou des événements (tels que des intrusions) imprévus (Rajasegarar et al., 2008;Hayes et Capretz, 2014;Rabatel et al., 2011;Chakrabarti et al., 2016). Les données de capteurs peuvent être binaires, discrètes, continues, audio, vidéo, etc. ...

Robust Anomaly Detection for Large-Scale Sensor Data
  • Citing Conference Paper
  • November 2016

... However, it is a computationally expensive operation when working with big data [60] . In fact, the efficiency and effectiveness of big data queries and analysis algorithms are greatly affected by the data partitioning scheme [60][61][62][63][64][65][66] . On Hadoop clusters, data partitioning is basically the responsibility of HDFS [10] . ...

Green- and heterogeneity-aware partitioning for data analytics

... At the network level, prior work has shown the benefits of duplicating flows (or specific packets of a flow) [41-43, 45, 49, 70, 74]. Similarly, other systems have shown the efficacy of duplication for storage (e.g., [40,64,68]) and distributed job execution frameworks [17-19, 65, 76]. ...

Zoolander: Efficiently meeting very strict, low-latency slos