About
29
Publications
18,467
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
340
Citations
Introduction
Skills and Expertise
Current institution
Publications
Publications (29)
Stop words removal is an important step in many natural language processing (NLP) tasks. Till now, there is no standardized, exhaustive, and dynamic stop word list created for documents written in Indian Gujarati language which is spoken by nearly 66 million people worldwide. Most of the existing stop words removal approaches are file or dictionary...
The meat food manufacturing industries play a crucial role in delivering various meat products to global consumers. However, one of the significant challenges within this industry is optimizing food processing efficiency across various stages, as it directly affects both product quality and production costs. Drying is one of the crucial stages, whe...
The industrial sector is currently undergoing a transformative era of intelligent automation driven by Artificial Intelligence (AI) capabilities. This synergy greatly enhances efficiency and seamlessly enables data-driven decision-making processes. These advantages enable more efficient resource allocation and enhance production planning precision....
Air pollution poses an urgent challenge to public health and ecosystems, particularly in rapidly urbanizing regions. Despite the severity of this issue, there is a lack of robust analytical frameworks capable of identifying key variables and their spatial effects across landscapes. Our study directly addresses this void by applying an innovative su...
1. Abstract: Computer vision's most active area of research is handwritten digit recognition. Numerous applications essentially motivate to build of an effective recognition model using computer vision that can empower the computers to analyze the images in parallel to human vision. Most efforts have been devoted to recognizing the handwritten digi...
Air pollution concentrations in Ho Chi Minh City (HCMC) have been found to surpass the WHO standard, which has become a very serious problem affecting human health and the ecosystem. Various machine learning algorithms have recently been widely used in air quality forecasting studies to predict possible impacts. Training and constructing several ma...
This article presents outdoor air pollution data acquired from the real-time Air Quality Monitoring Network (AQMN), which was established by the Healthyair project team in Ho Chi Minh City (HCMC), Vietnam. The AQMN is made up of six air pollution monitoring stations spread over the city (Traffic, Residential, and Industrial). Each station measures...
Outdoor air pollution damages the climate and causes many diseases, including cardiovascular diseases, respiratory infections, and lung damage. In particular, Particulate Matter (PM2.5) is considered a hazardous air pollutant to human health. Accurate hourly forecasting of PM2.5 concentrations is thus of significant importance for public health, he...
Vietnam will achieve net-zero greenhouse gases (GHG) emissions by 2050. Ho Chi Minh City (HCMC) has a considerable amount of GHG emissions (accounted for 20,68% of total GHG in Vietnam). The main GHG sources in HCMC are mainly due to the numerous private vehicles used and the increasing rate of factories. Therefore, to reduce the GHG of the city fr...
Objectives: To improve the efficiency of tri-level segmentation tasks for
handwritten Gujarati text. Methods: Using hybrid methods for tri-level
segmentation, we have used line, word and character segmentation from
the image. This study presents a segmentation paradigm that works with
touching characters, slop of the line written on the page, chara...
In day to day human life, handwritten documents are a general purpose for communication and restoring their information. In the field of computer science, character recognition using Deep Learning has more attention. DL has a massive set of pattern recognition tools that can apply to speech recognition, image processing, natural language processing...
In Natural Language Processing (NLP), language identification is the problem of determining which natural language(s) are used in written script. This paper presents a methodology for Language Identification from multilingual document written in Indian language(s). The main objective of this research is to automatically, quickly, and accurately rec...
Based on user query, to retrieve most relevant documents from the web for resource poor languages is a crucial task in Information Retrieval (IR) system. This paper presents Cosine Similarity Based Vector Space Document Model (VSDM) for Information Retrieval in Gujarati language. VSDM is widely used in information retrieval and document classificat...
The data for the current research work was collected for 42 different International languages encompassing 3 continents viz. Asia, Europe and South America. The data comprised of unigram model representation of lexicons in the stop-words lists. 13 scripting systems comprising Arabic, Armenian, Bengali, Chinese, Cyrillic, Devanagari, Greek, Gurmukhi...
Stop words elimination is important pre-processing step in Natural Language Processing (NLP) and text mining applications. Stop words removal improves the performance and quality of classifications system. In the context of classification task it is possible to reduce number of dimensions in the term of space by removing most common words which has...
Gujarati is a language of Indo-Aryan origin which in turn is a branch of Indo-European languages. For the written script of Gujarati language almost no automation tools are available for language processing due to complexity of Gujarati grammar as well as complex structure of Gujarati written script framework. This paper presents the design and imp...
Gujarati is a language of Indo-Aryan origin which in turn is a branch of Indo-European languages. For the written script of Gujarati language almost no automation tools are available for language processing due to complexity of Gujarati grammar as well as complex structure of Gujarati written script framework. This paper presents the design and imp...
Gujarati is a language of Indo-Aryan origin which in turn is a branch of Indo-European languages. For the written script of Gujarati language almost no automation tools are available for language processing due to complexity of Gujarati grammar as well as complex structure of Gujarati written script framework. This paper presents the design and imp...
Questions
Questions (3)
I want to create training set from data, response variable is binary variable (values '1' & '0') but in original data ration or 1 and 0 is 1:2. Then, How to handle class imbalance with Up & down sampling for binary classification problem??
CI is related to multi-category document classification problem.
Number of categories : 4
Please find attached document.