Tirthankar Ghosal

Tirthankar Ghosal
Indian Institute of Technology Patna | IIT Patna · Department of Computer Science and Engineering

About

76
Publications
12,312
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
575
Citations
Introduction
Tirthankar Ghosal is a researcher in Elsevier Centre of Excellence for Natural Language Processing, Department of Computer Science and Engineering, Indian Institute of Technology Patna. Tirthankar does research in Scholarly Data Mining and Artificial Intelligence (specifically Machine Learning/Deep Learning). His current project is " How AI could benefit scholarly communication? Exploring the aspects of an AI Assisted Peer Review System" The motivation is to leverage current AI capabilities to better streamline the peer review system in academia.

Publications

Publications (76)
Article
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present pathfinder , a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural lan...
Preprint
Full-text available
AstroSage-Llama-3.1-8B is a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, and cosmology. Trained on the complete collection of astronomy-related arXiv papers from 2007-2024 along with millions of synthetically-generated question-answer pairs and other astronomical literature, AstroSage-Llama-3.1-...
Preprint
Full-text available
AstroSage-Llama-3.1-8B is a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, and cosmology. Trained on the complete collection of astronomy-related arXiv papers from 2007-2024 along with millions of synthetically-generated question-answer pairs and other astronomical literature, AstroSage-Llama-3.1-...
Preprint
The integrity of the peer-review process is vital for maintaining scientific rigor and trust within the academic community. With the steady increase in the usage of large language models (LLMs) like ChatGPT in academic writing, there is a growing concern that AI-generated texts could compromise scientific publishing, including peer-reviews. Previou...
Preprint
Full-text available
Continual pretraining of large language models on domain-specific data has been proposed to enhance performance on downstream tasks. In astronomy, the previous absence of astronomy-focused benchmarks has hindered objective evaluation of these specialized LLM models. Leveraging a recent initiative to curate high-quality astronomical MCQs, this study...
Preprint
"An idea is nothing more nor less than a new combination of old elements" (Young, J.W.). The widespread adoption of Large Language Models (LLMs) and publicly available ChatGPT have marked a significant turning point in the integration of Artificial Intelligence (AI) into people's everyday lives. This study explores the capability of LLMs in generat...
Preprint
Full-text available
The exponential growth of astronomical literature poses significant challenges for researchers navigating and synthesizing general insights or even domain-specific knowledge. We present Pathfinder, a machine learning framework designed to enable literature review and knowledge discovery in astronomy, focusing on semantic searching with natural lang...
Preprint
Full-text available
We present a comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. This dataset comprises 4,425 multiple-choice questions curated from the Annual Review of Astronomy and Astrophysics, covering a broad range of astrophysical topics. Our analysis examines model performa...
Article
Full-text available
Research in Natural Language Processing (NLP) is increasing rapidly; as a result, a large number of research papers are being published. It is challenging to find the contributions of the research paper in any specific domain from the huge amount of unstructured data. There is a need for structuring the relevant contributions in Knowledge Graph (KG...
Article
The growing volume of scientific literature makes it difficult for researchers to identify the key contributions of a research paper. Automating this process would facilitate efficient understanding, faster literature surveys and comparisons. The automated process may help researchers to identify relevant and impactful information in less time and...
Article
Full-text available
Citations are crucial artifacts to provide additional information to the reader to comprehend the research under concern. There are different roles that citations play in scientific discourse. Correctly identifying the intent of the citations finds applications ranging from predicting scholarly impact, finding idea propagation, to text summarizatio...
Conference Paper
Scientific article summarization poses a challenge because the interpretability of the article depends on the objective, experience of the reader. Editors/Chairs assign experts in the domain as peer reviewers. These experts often write a summary of the article at the beginning of their reviews which offers a summarized view of their understanding (...
Conference Paper
Full-text available
Peer reviews are intended to give authors constructive and informative feedback. It is expected that the reviewers will make constructive suggestions over certain aspects, e.g., novelty, clarity, empirical and theoretical soundness, etc., and sections, e.g., problem definition/idea, datasets, methodology, experiments , results, etc., of the paper i...
Article
Full-text available
Online news consumption via social media platforms has accelerated the growth of digital journalism. Adverse to traditional media, digital media has lower entry barriers and allows everyone as a content creator, resulting in numerous fake news productions to attract public attention. As multimedia content is more convenient for users than expressin...
Article
Full-text available
One key frontier of artificial intelligence (AI) is the ability to comprehend research articles and validate their findings, posing a magnanimous problem for AI systems to compete with human intelligence and intuition. As a benchmark of research validation, the existing peer-review system still stands strong despite being criticized at times by man...
Article
Full-text available
With the growing presence of multimodal content on the web, a specific category of fake news is rampant on popular social media outlets. In this category of fake online information, real multimedia contents (images, videos) are used in different but related contexts with manipulated texts to mislead the readers. The presence of seemingly non-manipu...
Article
Full-text available
With the ever-increasing number of submissions in top-tier conferences and journals, finding good reviewers and meta-reviewers is becoming increasingly difficult. Writing a meta-review is not straightforward as it involves a series of sub-tasks, including making a decision on the paper based on the reviewer’s recommendation and their confidence in...
Article
Full-text available
Peer reviews form an essential part of scientific communication. Scholarly peer review is probably the most accepted way to evaluate research papers by involving multiple experts to review the concerned research independently. Usually, the area chair, the program chair, or the editor takes a call weighing the reviewer’s judgments. It communicates t...
Chapter
A meta-review usually written by the editor of a journal or the area/program chair in a conference is a summary of the peer-reviews and a concise interpretation of the editors/chairs decision. Although the task closely simulates a multi-document summarization problem, automatically writing reviews on top of human-generated reviews is something very...
Chapter
Full-text available
Review comments play an important role in the improvement of scientific articles. There are typically many rounds of review-revision before the different reviewers with varying backgrounds arrive at a consensus on a submission. However, not always the reviews are helpful. Sometimes the reviewers are unnecessarily critical of the work without justif...
Chapter
With the ever-increasing number of submissions in top-tier conferences and journals, finding good reviewers and meta-reviewers is becoming increasingly difficult. Writing a meta-review is not straightforward as it involves a series of sub-tasks, including making a decision on the paper based on the reviewer’s recommendation and their confidence in...
Article
Full-text available
Machine Reading Comprehension (MRC) of a document is a challenging problem that requires discourse-level understanding. Information extraction from scholarly articles nowadays is a critical use case for researchers to understand the underlying research quickly and move forward, especially in this age of infodemic. MRC on research articles can also...
Chapter
In the version of this paper that was originally published one crucial acknowledgement was missing. This has now been corrected.
Chapter
The scholarly peer-reviewing system is the primary means to ensure the quality of scientific publications. An area or program chair relies on the reviewer’s confidence score to address conflicting reviews and borderline cases. Usually, reviewers themselves disclose how confident they are in reviewing a certain paper. However, there could be inconsi...
Preprint
Full-text available
Exponential growth in digital information outlets and the race to publish has made scientific misinformation more prevalent than ever. However, the task to fact-verify a given scientific claim is not straightforward even for researchers. Scientific claim verification requires in-depth knowledge and great labor from domain experts to substantiate su...
Article
Full-text available
Peer Review is at the heart of scholarly communications and the cornerstone of scientific publishing. However, academia often criticizes the peer review system as non-transparent, biased, arbitrary, a flawed process at the heart of science, leading to researchers arguing with its reliability and quality. These problems could also be due to the lack...
Article
One of the most time-critical challenges for the Natural Language Processing (NLP) community is to combat the spread of fake news and misinformation. Existing approaches for misinformation detection use neural network models, statistical methods, linguistic traits, fact-checking strategies, etc. However, the menace of fake news seems to grow more v...
Chapter
Full-text available
A drastic rise in potentially life-threatening misinformation has been a by-product of the COVID-19 pandemic. Computational support to identify false information within the massive body of data on the topic is crucial to prevent harm. Researchers proposed many methods for flagging online misinformation related to COVID-19. However, these methods pr...
Article
Full-text available
The quest for new information is an inborn human trait and has always been quintessential for human survival and progress. Novelty drives curiosity, which in turn drives innovation. In Natural Language Processing (NLP), Novelty Detection refers to finding text that has some new information to offer with respect to whatever is earlier seen or known....
Chapter
Peer-review process is fraught with issues like bias, inconsistencies, arbitrariness, non-committal weak rejects, etc. However, it is anticipated that the peer reviews provide constructive feedback to the authors against some aspects of the paper such as Motivation/Impact, Soundness/Correctness, Novelty, Substance, etc. A good review is expected to...
Article
The process of peer-review is considered as the sentinel of science in scholarly communications. However, with the deluge of research articles and the overload of scholarly information, it is increasingly becoming difficult for humans to keep up with the pace of the latest research. This thesis uses Machine Learning (ML) and Natural Language Proces...
Article
The first workshop on Argumentation Knowledge Graphs (ArgKG) was held virtually at the Automated Knowledge Base Construction (AKBC 2021) conference on October 7, 2021. ArgKG @ AKBC 2021 brought together the Computational Argumentation and Knowledge Graphs communities, aiming to promote cross-pollination of ideas and encourage discussions and collab...
Article
The SummDial special session on summarization of dialogues and multi-party meetings was held virtually within the SIGDial 2021 conference on July 29, 2021. SummDial @ SIGDial 2021 aimed to bring together the speech, dialogue, and summarization communities to foster cross-pollination of ideas and fuel the discussions/collaborations to attempt this c...
Chapter
In the originally published version of chapter 88 the acknowledgement statement was erroneously omitted. The acknowledgement statement has been added to the chapter.
Chapter
With the rapid growth of scientific literature, it is becoming increasingly difficult to identify scientific contribution from the deluge of research papers. Automatically identifying the specific contribution made in a research paper would help quicker comprehension of the work, faster literature survey, comparison with the related works, etc. Her...
Chapter
The rapid growth of scientific literature is presenting several challenges for the search and discovery of research artifacts. Datasets are the backbone of scientific experiments. It is crucial to locate the datasets used or generated by previous research as building suitable datasets is costly in terms of time, money, and human labor. Hence automa...
Preprint
Full-text available
A drastic rise in potentially life-threatening misinformation has been a by-product of the COVID-19 pandemic. Computational support to identify false information within the massive body of data on the topic is crucial to prevent harm. Researchers proposed many methods for flagging online misinformation related to COVID-19. However, these methods pr...
Article
Full-text available
Finding the lineage of a research topic is crucial for understanding the prior state of the art and advancing scientific displacement. The deluge of scholarly articles makes it difficult to locate the most relevant previous work. It causes researchers to spend a considerable amount of time building up their literature list. Citations play a crucial...
Chapter
Full-text available
Peer review is the widely accepted method of research validation. However, with the deluge of research paper submissions accompanied with the rising number venues, the paper vetting system has come under a lot of stress. Problems like dearth of adequate reviewers, finding appropriate expert reviewers, maintaining the quality of the reviews are stea...
Chapter
Selecting a potential reviewer to review a manuscript, submitted at a conference is a crucial task for the quality of a peer-review process that ultimately determines the success and impact of any conference. The approach adopted to find the potential reviewer needs to be consistent with its decision of allocation. In this work, we propose a framew...
Article
Finding the lineage of a research topic is crucial for understanding the prior state of the art and advancing scientific displacement. The deluge of scholarly articles makes it difficult to locate the most relevant prior work and causes researchers to spend a considerable amount of time building up their literature list. Citations play a significan...
Article
Fake news or misinformation is the information or stories intentionally created to deceive or mislead the readers. Nowadays, social media platforms have become the ripe grounds for misinformation, spreading them in a few minutes, which led to chaos, panic, and potential health hazards among people. The rapid dissemination and a prolific rise in the...
Chapter
Deciding the appropriateness of a manuscript to the aims and scope of a journal is very important in the first stage of peer review. Editors should be confident about the article’s suitability to the intended journal to further channel its progress through the steps in the review process. However, not all sections in a research article are equally...
Article
Detecting, whether a document contains sufficient new information to be deemed as novel , is of immense significance in this age of data duplication. Existing techniques for document-level novelty detection mostly perform at the lexical level and are unable to address the semantic-level redundancy. These techniques usually rely on handcrafted featu...
Conference Paper
Editorial pre-screening is the first step in academic peer review. The deluge of research papers and the huge amount of submissions being made to journals these days makes editorial decision a very challenging task. The current work attempts to investigate certain impact factors that may have a role in the editorial decision making process. The pro...
Article
Full-text available
Detecting novelty of an entire document is an Artificial Intelligence (AI) frontier problem that has widespread NLP applications, such as extractive document summarization, tracking development of news events, predicting impact of scholarly articles, etc. Important though the problem is, we are unaware of any benchmark document level data that corr...
Conference Paper
Sentiment analysis in its simplest form is the classification of a piece of text into positive or negative class based on the polarity of the text. Horoscopes consist of future predictions for each of the twelve zodiac signs and are very popular in India. All major TV channels and newspapers publish their horoscope expert's predictions on a daily b...

Questions

Questions (3)
Question
What could be the approaches to combine the pairwise document similarity scores to get the overall similarity score of a certain document against a document collection?
Question
  1. DKPro: Similarity framework by UKP:TUDA for measuring Semantic Textual Similarity
  2. Apache UIMA based language processor pipeline?
  3. SemEval tasks
Question
I am trying to find the semantic similarity between two documents....Which method/metric scales the best at present? Any related literature regarding this? Most of the metrics I found works on sentence level...any insight into document level semantic similarity measurement?

Network

Cited By