Tanmay BasuIndian Institute of Science Education and Research Bhopal | IISER · Department of Data Science and Engineering
Tanmay Basu
PhD in Computer Science
About
26
Publications
8,930
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
407
Citations
Introduction
I have been working on information extraction from electronic health records (EHR), biomedical publications and social media to overcome several issues e.g., identifying the risk of depression over social media. My broad objective is to work on novel computational natural language processing, text mining, machine learning and sentiment analysis techniques for potential knowledge discovery from social media, articles, EHRs and other types of text data.
Additional affiliations
June 2021 - present
January 2018 - July 2019
Ramakrishna Mission Vivekananda Educational and Research Institute
Position
- Professor (Assistant)
Description
- I taught introduction to machine learning and advanced machine learning courses respectively in the even (January-May) and odd (July-December) semesters in the MSc Computer Science and MSc Data Science programs.
Education
August 2008 - July 2014
August 2005 - May 2008
August 2002 - May 2005
Publications
Publications (26)
This paper develops a novel regression framework to estimate international tourist arrivals in 37 countries from the Organization for Economic Co-operation and Development (OECD) countries by combining significant socio-economic-environment features and a natural language processing (NLP) based social media index. The index is developed by fine-tun...
The Forum for Information Retrieval (FIRE) started a shared task this year for classification of comments of different code segments. This is binary text classification task where the objective is to identify whether comments given for certain code segments are relevant or not. The BioNLP-IISERB group at the Indian Institute of Science Education an...
Visibility of an area can affect all forms of transportation and hence it is important to accurately estimate the visibility of an area for the upcoming days based on different parameters of the meteorological data to take precautions. Several machine learning techniques have been already applied on different kinds of data sets to estimate the visi...
The task of text clustering is to partition a set of text documents into different meaningful groups such that the documents in a particular cluster are more similar to each other than the documents of other clusters according to a similarity or dissimilarity measure. Therefore, the role of similarity measure is crucial for producing good-quality c...
Automated community detection is an important problem in the study of complex networks. The idea of community detection is closely related to the concept of data clustering in pattern recognition. Data clustering refers to the task of grouping similar objects and segregating dissimilar objects. The community detection problem can be thought of as f...
The objective of systematic reviews is to address a research question by summarizing relevant studies following a detailed, comprehensive, and transparent plan and search protocol to reduce bias. Systematic reviews are very useful in the biomedical and healthcare domain; however, the data extraction phase of the systematic review process necessitat...
The k-nearest-neighbor (kNN) decision rule is a simple and robust classifier for text categorization. The performance of kNN decision rule depends heavily upon the value of the neighborhood parameter k. The method categorize a test document even if the difference between the number of members of two competing categories is one. Hence, choice of k i...
Simon Goldsworthy presents a collaborative work on behalf of all authors:
'An Effective Machine Learning Framework for Data Elements Extraction from the Literature of Anxiety Outcome Measures to Build Systematic Review'
The process of developing systematic reviews is a well established method of collecting evidence from publications, where it follows a predefined and explicit protocol design to promote rigour, transparency and repeatability. The process is manual and involves lot of time and needs expertise. The aim of this work is to build an effective framework...
Modularity is a widely used goodness metric that effectively measures the strength of the community structures present in a network. However its performance may not be desirable for identifying densely connected communities or clusters of a network. It also often fails to identify communities or clusters that contain very few nodes. Furthermore, mo...
The CLEF eRisk 2018 challenge focuses on early detection of signs of depression or anorexia using posts or comments over social media. The eRisk lab has organized two tasks this year and released two different corpora for the individual tasks. The corpora are developed using the posts and comments over Reddit, a popular social media. The machine le...
In this era of abundance, we humans strive to choose. Recommender systems are widely used to cope up with this crisis of abundance by recommending items that we may like based on our previous consumption history. In this paper, a weighting techniqiue is proposed in spirit of the term weighting scheme of the text retrieval system for item based coll...
Background:
Qualitative research methods are increasingly being used across disciplines because of their ability to help investigators understand the perspectives of participants in their own words. However, qualitative analysis is a laborious and resource-intensive process. To achieve depth, researchers are limited to smaller sample sizes when an...
Term selection methods in text categorization effectively reduce the size of the vocabulary to improve the quality of classifier. Each corpus generally contains many irrelevant and noisy terms, which eventually reduces the effectiveness of text categorization. Term selection, thus, focuses on identifying the relevant terms for each category without...
A systematic review identifies and collates various clinical studies and compares data elements and results in order to provide an evidence based answer for a particular clinical question. The process is manual and involves lot of time. A tool to automate this process is lacking. The aim of this work is to develop a framework using natural language...
The similarity based decision rule computes the similarity between a new test document
and the existing documents of the training set that belong to various categories. The new document
is grouped to a particular category in which it has maximum number of similar documents. A
document similarity based supervised decision rule for text categorizatio...
Document clustering refers to the task of grouping similar documents and segregating dissimilar documents. It is very useful to find meaningful categories from a large corpus. In practice, the task to categorize a corpus is not so easy, since it generally contains huge documents and the document vectors are high dimensional. This paper introduces a...
The k-nearest neighbour rule is a simple and effective classifier for document classification. In this method, a document is put into a particular class if the class has the maximum representation mong the k nearest neighbours of the documents in the training set. The k nearest neighbours of a test document are ordered based on their content simila...
Objective of the document clustering techniques is to assemble similar documents and segregate dissimilar documents. Unlike document classification, no labeled documents are provided in document clustering. One of the main challenges of any document clustering algorithm is the selection of a good similarity measure. Traditionally, using the vector...
The aim of text document classification is to automatically group a document to a predefined class. The main problem of document classification is high dimensionality and sparsity of the data matrix. A new feature selection technique using the google distance have been proposed in this article to effectively obtain a feature subset which improves t...
The high dimensionality of data is a great challenge for effective text classification. Each document in a document corpus contains many irrelevant and noisy information which eventually reduces the efficiency of text classification. Automatic feature selection methods are extremely important to handle the high dimensionality of data for effective...
A nearest neighbor decision rule is developed here for data classification based on a tweak on k-nearest neighbor (kNN) decision rule. This method restricts the majority voting of kNN decision rule with a predefined positive integer threshold, say β, to assign a data point to a given class. For the proposed method, there is no need to select a fixe...
Semantic relation is an important concept of information science. Now a days it is widely used in semantic web. This paper
aims to present a measure to automatically determine semantic relation between words using web as knowledge source. It explores
whether two words are related or not even if they are dissimilar in meaning. The proposed formula i...