Sushant GautamSimula Research Laboratory · Department of Holistic Systems at SimulaMet
Sushant Gautam
Master of Science
Interested in operationalizing AI/ML.
About
33
Publications
15,924
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
66
Citations
Introduction
PhD Student | Graduated from the Department of Electronics and Computer Engineering, Central Campus Pulchowk, IOE, Tribhuvan University.
Additional affiliations
March 2019 - present
UBL R&D Center Nepal
Position
- Project Manager
Description
- Experimenting using new technology in various domains including education, environment and healthcare. Project Lead for NSD-Al Project
March 2019 - August 2019
Leapfrog Technology, Inc.
Position
- AI Intern
Description
- RNN models for time series modelling on Tensorflow for weather and pollution prediction. ARIMA model and windowing based time series predictive models Model deployment and serving
January 2018 - August 2019
UGC Research Project
Position
- Research Assistant
Description
- > Research assistant to Dr Nanda Bikram Adhikari for his UGC Collaborative Research Grant Project focused on finding ways to measure pollution and environment levels on Kathmandu valley.
Education
September 2020 - August 2022
September 2015 - September 2019
Publications
Publications (33)
Soccer dominates the global sports market, and viewers’ interest in watching videos of
soccer matches is ramping up. Globally, there is a huge and constantly increasing amount
of soccer game content being generated, including video footage, audio commentary,
text metadata, goal and player statistics, scores, and rankings. As a large percentage of
a...
Soccer is one of the most popular sports globally, and the amount of soccer-related content worldwide, including video footage, audio commentary, team/player statistics, scores, and rankings, is enormous and rapidly growing. Consequently, the generation of multimodal summaries is of tremendous interest for broadcasters and fans alike, as a large pe...
In association football, the development of multimodal summaries is of great importance to both broadcasters and spectators since a large number of viewers choose to follow just the soccer game highlights. The fundamental drive for the development of summarization systems is the requirement to manage huge amounts of data in different formats. By hi...
Nepal, containing a rugged elevation ranging from less than 100 meters to over 8000 meters and having various climates varying from tropical to alpine and perpetual snow, has a great potential for the study of the highly varying environment and weather proxies. Fine spatio-temporal-scale measurements of such data using sufficiently distributed auto...
This paper examines the integration of real-time talking-head generation for interviewer training, focusing on overcoming challenges in Audio Feature Extraction (AFE), which often introduces latency and limits responsiveness in real-time applications. To address these issues, we propose and implement a fully integrated system that replaces conventi...
This paper demonstrates PlayerTV, an innovative framework which harnesses state-of-the-art Artificial Intelligence (AI) technologies for automatic player tracking and identification in soccer videos. By integrating object detection and tracking, Optical Character Recognition (OCR), and color analysis, Play-erTV facilitates the generation of player-...
Extracting meaningful insights from large and complex datasets poses significant challenges, particularly in ensuring the accuracy and relevance of retrieved information. Traditional data retrieval methods such as sequential search and index-based retrieval often fail when handling intricate and interconnected data structures, resulting in incomple...
We introduce Kvasir-VQA, an extended dataset derived from the HyperKvasir and Kvasir-Instrument datasets, augmented with question-and-answer annotations to facilitate advanced machine learning tasks in Gastrointestinal (GI) diagnostics. This dataset comprises 6,500 annotated images spanning various GI tract conditions and surgical instruments, and...
In the rapidly evolving field of sports analytics, the automation of targeted video processing is a pivotal advancement. We propose PlayerTV, an innovative framework which harnesses state-of-the-art AI technologies for automatic player tracking and identification in soccer videos. By integrating object detection and tracking, Optical Character Reco...
The rapid evolution of digital sports media necessitates sophisticated information retrieval systems that can efficiently parse extensive multimodal datasets. This paper introduces SoccerRAG, an innovative framework designed to harness the power of Retrieval Augmented Generation (RAG) and Large Language Models (LLMs) to extract soccer-related infor...
The rapid evolution of digital sports media necessitates sophisticated information retrieval systems that can efficiently parse extensive multimodal datasets. This paper demonstrates SoccerRAG, an innovative framework designed to harness the power of Retrieval Augmented Generation (RAG) and Large Language Models (LLMs) to extract soccer-related inf...
Fact-checking is a crucial natural language processing (NLP) task that verifies the truthfulness of claims by considering reliable evidence. Traditional methods are often limited by labour-intensive data curation and rule-based approaches. In this paper, we present FactGenius, a novel method that enhances fact-checking by combining zero-shot prompt...
The rapid advancement of technology has been revolutionizing the field of sports media, where there is a growing need for sophisticated data processing methods. Current methodologies for extracting information from soccer broadcast videos to generate game highlights and summaries for social media are predominantly manual and rely heavily on text-ba...
This paper introduces SoccerSum, a novel dataset aimed at enhancing object detection and segmentation in video frames depicting the soccer pitch, using footage from the Norwegian Eliteserien league across 2021-2023. With the goal of detecting elements beyond common entities in existing datasets, such as the soccer ball, players and referees, this d...
This paper introduces TACDEC, a dataset of tackle events in soccer game videos. Recognizing the gap in existing open datasets that predominantly focus on official soccer events such as goals and cards, TACDEC targets a comprehensive analysis of tackles --- a critical aspect of soccer that combines technical skills, tactical decision-making, and phy...
Social media plays a significant role for sports organizations with millions of active fans, but publishing highlights is often a tedious manual operation. With the development of AI, new tools are available for content generation and personalization to engage audiences. We propose an AI-based multimedia production framework for the automatic publi...
In the era of digitalization, social media has become an integral part of our lives, serving as a significant hub for individuals and businesses to share information, communicate, and engage. This is also the case for professional sports, where leagues, clubs and players are using social media to reach out to their fans. In this respect, a huge amo...
With the increasing availability of multimodal data, especially in the sports and medical domains, there is growing interest in developing Artificial Intelligence (AI) models capable of comprehending the world in a more holistic manner. Nevertheless, various challenges exist in multimodal understanding, including the integration of multiple modalit...
Dystonia is a movement disorder that causes unusual movements and involuntary muscle contractions affecting some parts of the whole body. Selecting drugs and doses is a highly personalized process for dystonia, requiring frequent visits to the clinic, pointing toward the need for more systematic and objective methods of collecting patient data. A d...
This project, for the first time of its nature, introduces a new research paradigm of remote motion sensing for health monitoring of civil construction on the public safety domain in Nepal. Preliminary data from a piloting study from BRB encourages us to move forward with an aging analysis of such civil structures. Students from DoECE at IOE, Pulch...
Nepal, containing a rugged elevation ranging from less than 100 meters to over 8,848 meters and having various climate varying from tropical to alpine and perpetual snow has a great potential for the study of highly varying environment and weather proxies. Fine spatio-temporal scale measurements of such data using sufficiently distributed automatic...
One of the major challenges in searching on the internet has been that search engines and online forums have not been able to extract and pinpoint exact answer to people's query despite information being available on the internet. Extraction of to-the-point answers from articles, posts and blogs tend to improve search accuracy. Sentence Ranking hel...
One of the major challenges in searching on the internet has been that the search engines and online forums have not been able to extract and pinpoint the exact answer to people's query despite information being available on the internet. Extraction of to-the-point answers from articles, posts and blogs tend to improve the search accuracy. Sentence...
The street lighting system is based upon the electronic controller that utilizes the traffic density survey data. An android mobile app was developed for this purpose. Data was collected and analyzed at different busy junctions of Kathmandu Valley. The app maintained a database record of each vehicles type that enter in the system and simultaneousl...
Facial Landmarks Detection is a neural network-based project built using re-learning approach that uses Histogram of oriented gradients (HOG) features of images to train a deep learning neural network in order to extract the facial features from the image. This report presents the methodology and algorithms for detecting facial landmarks through th...
This project aims to design a system that takes a video as an input splitting it into frames
and obtains images extracting its features and landmarks through the algorithms of
machine learning thus providing a base for a number of possible systems. The main
objectives of this project are:
· To design a system that takes an image as an input and det...
The Refugee Crisis is unequivocally the most burning global problem that has been
haunting the modern world. With the causes and factors ranging from political
instability, safety threat and mass discrimination to environmental hazards or simply
a search for a quality life, the number of displaced people is on an ever-increasing
curve. For the most...
Shanti is a project that provides part time vocational training and opportunities to the women, specially the victims of gender-based violence, so that they can get independent and do something for living on their own. Vocational training includes sewing clothes and preparation of textiles accessories like bags, cushion covers and purses.