Science topic

Data Mining - Science topic

Explore the latest publications in Data Mining, and find Data Mining experts.
Filters
All publications are displayed by default. Use this filter to view only publications with full-texts.
Publications related to Data Mining (10,000)
Sorted by most recent
Article
Full-text available
Transformer winding turn-to-turn fault is the prominent cause of transformer total failure, so detecting the winding fault in real time to stop the failure development in advance is imperative. However, existing techniques entailing periodic offline inspections fail to continuously monitor transformer winding states while causing extra costs due to...
Conference Paper
Full-text available
A significant portion of deaths are related to heart failure. There are various medical and non-medical methods for diagnosing or predicting heart failure, but they are often approximate and inaccurate. Diagnostic methods for heart failure include blood tests, chest X-rays, echocardiograms, angiography, and other medical techniques that are costly...
Research Proposal
Full-text available
This special issue aims at presenting recent advances in concrete structural damage detection, safety evaluation and health monitoring, particularly those enhanced by machine leaning, computational intelligence or data mining. This issue will cover topics of interest that include, but are not limited to, the following topics: Concrete structural da...
Article
Full-text available
There is increasing interest in predicting traffic measures by modeling big data-driven complex scenarios with data mining and machine learning methods. In this study, the parameters of the traffic analysis model were created using 35,697 Twitter traffic notifications. The relationships and effects between the parameters of hour, day, month, season...
Article
Full-text available
Considering the trend of home energy use in the context of global energy scarcity, this study examines the effects of variables such as physical and social factors on household energy. Article summarization and text mining approaches were utilized to create a broad picture of home energy demand with successful predictive models to identify the most...
Conference Paper
Full-text available
Deadline for manuscript submissions: 30 October 2023 https://www.mdpi.com/journal/information/special_issues/BR5E3N53AT Our aspiration is to form a dedicated Special Issue to bring together research on potential topics such as: - HCI for people with special needs and for elderly users; - Study of user behaviour through means of machine learning,...
Conference Paper
Full-text available
Digital tools have the ability to log the fine-grained details of user experiences within and across the system. These digital experiences can lend valuable contextualization to other ethnographic insights. In this paper, we discuss the potential for using interaction logs as a data source and the pipeline considerations that can facilitate and enh...
Research Proposal
Full-text available
Dear young geoscientists, in occasion of the conference "Sustainability and risk: BeGEO scientists on the road to the future" to be held in Naples from 3 to 6 October 2023, we would like to invite all MSc students, PhDs, post-docs and early career researchers working in the field of environmental sciences to present their contribution for the sess...
Article
Full-text available
Data-driven fault diagnosis has attracted attention with the recent trend of obtaining representative features from high-dimensional, strongly coupled, and nonlinear process data. This paper presents a novel dimensionality reduction (DR) algorithm named double preserving integrated with neighborhood locality projections (DPNLP) for fault diagnosis....
Conference Paper
Full-text available
Generally speaking, labeled data is difficult and expensive to provide for applications in machine learning and data mining. One of the earliest approaches to tackle this problem is semi-supervised self-training to take advantages of labeled and unlabeled data to create pseudo-labeled data. However, reaching a high level of confidence to predict th...
Article
Full-text available
Nowadays, enormous amounts of data are produced every second. These data also contain private information from sources including media platforms, the banking sector, finance, healthcare, and criminal histories. Data mining is a method for looking through and analyzing massive volumes of data to find usable information. Preserving personal data duri...
Conference Paper
Full-text available
Continuous Integration (CI) is a widely adopted practice in modern software engineering that involves integrating developers' local changes with the project baseline daily. Despite its popularity, recent studies have revealed that integrating changes can be time-consuming, requiring significant effort to correct errors that arise. This can lead to...
Article
Full-text available
The rapid increase in the volume of generated data from various digital resources motivated a new trend of data mining techniques that can be trained continuously in parallel with data stream generation. This kind of technique needs to adapt to new data samples and forget the old ones according to some methods such as Adaptive Sliding Window ADWIN....
Article
Full-text available
This study identifies the topical areas of research that have attempted a psychological approach to soccer research over the last 33 years (1990–2022) and explored the growth and stagnation of the topic as well as research contributions to soccer development. Data were obtained from 1863 papers from the Web of Science database. The data were collec...
Article
Full-text available
The capability of extracting information and analyzing it so that it is in a common format is essential for performing predictions, comparing projects through cost benchmarking, and having a deeper understanding of the project costs. However, the lack of standardization and the manual inclusion of data make this process very time-consuming, unrelia...
Article
Full-text available
Naive Bayes Classifier is a strong tool or model in classifying students' performance based on various factors. Thus, this research developed a classification model that can accurately classify students into different academic performance categories. The study utilized data, collected from 1,422 students at the University of Ibadan, Nigeria. Descri...
Conference Paper
Full-text available
Clustering is a well-known task in Data Mining that aims at grouping data instances according to their similarity. It is an exploratory and unsupervised task whose results depend on many parameters, often requiring the expert to iterate several times before satisfaction. Constrained clustering has been introduced for better modeling the expectation...
Conference Paper
Full-text available
The Brazilian energy sector faces challenges in ensuring the supply of electricity to all regions of the country. Some communities, due to their geographical constraints, are isolated and have difficult access. Based on this assumption, this article proposes the use of data mining and machine learning as an alternative to contribute to the improvem...
Article
Full-text available
Business process compliance is an essential part of business process management, which saves organizations from penalties caused by non-compliant processes. However, current researches on business process compliance mainly focus on checking using general constraint rules that have been formalized without in-depth analysis of related regulatory docu...
Article
Full-text available
Government attention and policy implementation are important for achieving environmental goals. This study uses 257 cities in China from 2013–2019 as the sample and measures local government attention through text mining, thus exploring the carbon emission reduction effect of government attention. Threshold regression is further used to examine the...
Article
Full-text available
In the field of data analysis and mining, adopting efficient data indexing and compression techniques to spatiotemporal data can significantly reduce computational and storage overhead for the abilities to control the volume of data and exploit the spatiotemporal characteristics. However, traditional lossy compression techniques are hardly suitable...
Article
Full-text available
In order to realize online detection and control of network viruses in robots, the authors propose a data mining-based anti-virus solution for smart robots. First, using internet of things (IoT) intrusion prevention system design method based on network intrusion signal detection and feedforward modulation filtering design, the overall design descr...
Preprint
Full-text available
p>A multilayered graph is a dispensable data representation tool to comprehend and mine the richness and complexity of complex systems in real-world scenarios. Multifaceted interconnecting relationships cast in multiple layers of the graph construct a holistic, versatile, and powerful framework that enables researchers to effectively process inform...
Preprint
Full-text available
p>A multilayered graph is a dispensable data representation tool to comprehend and mine the richness and complexity of complex systems in real-world scenarios. Multifaceted interconnecting relationships cast in multiple layers of the graph construct a holistic, versatile, and powerful framework that enables researchers to effectively process inform...
Article
Full-text available
Relying on the nation's first judicial big data research base for people's courts in Southeast University, Southeast University Law School has set up a training direction for graduate students in legal big data and artificial intelligence, and explored the “three-dimensional, small-scale, wide-ranging, and large-scale ecology.” The concept of “meta...
Article
Full-text available
After nearly forty years of development, China’s land consolidation policies (CLCP) have become an important tool for promoting rural revitalization and sustainable development. However, as a major land management policy, there is still a lack of quantitative evaluation research on its text. This paper establishes an evaluation system for CLCP usin...
Article
Full-text available
In machine learning and data mining applications, an imbalanced distribution of classes in the training dataset can drastically affect the performance of learning models. The class imbalance problem is frequently observed during classification tasks in real-world scenarios when the available instances of one class are much fewer than the amount of...
Article
Full-text available
Determination of live weight, which is one of the most important features that determine meat production, is a very important issue for herd management and sustainable livestock. In this context, the necessity of finding alternative methods has emerged, especially in rural conditions, due to the difficulties to be experienced in finding the weighin...
Article
Full-text available
Soil water content (SWC) plays a key role in the management of water and soil resources. Accurate prediction of SWC is an important issue in water and soil studies. Recently, some data mining and machine learning techniques were proposed for SWC prediction and achieved encouraging results. This paper presents four data mining predictive algorithms...
Article
Full-text available
Large language models (LLMs) have substantially pushed artificial intelligence (AI) research and applications in the last few years. They are currently able to achieve high effectiveness in different natural language processing (NLP) tasks, such as machine translation, named entity recognition, text classification, question answering, or text summa...
Article
Full-text available
Cardiovascular diseases (CVDs) account for a significant portion of global mortality, emphasizing the need for effective strategies. This study focuses on myocardial infarction, pulmonary thromboembolism, and aortic stenosis, aiming to empower medical practitioners with tools for informed decision making and timely interventions. Drawing from data...
Article
Full-text available
In the context of electrical power systems, modeling the edge-end interaction involves understanding the dynamic relationship between different components and endpoints of the system. However, the time series of electrical power obtained by user terminals often suffer from low-quality issues such as missing values, numerical anomalies, and noisy la...
Article
Full-text available
This research investigates the factors influencing user satisfaction and dissatisfaction in fitness mobile applications. It employs Herzberg’s two-factor model through text mining to classify Fitbit mobile app attributes into satisfiers and dissatisfiers. The Fitbit app was chosen due to its prevalence in the United States. The study analyzes 100,0...
Article
Full-text available
In the generation of technological innovation like today, the expansion of data in the database is very fast, all things related to technology are fully complementing the growth of data, as well as scientific data, social media, and financial technology. Data mining tools and techniques will predict future trends by making businesses more proactive...
Article
Full-text available
It is advantageous for schools to realize exact education management, the cornerstone for developing abilities, by constructing higher vocational education teaching quality evaluation systems. The division of frequent item sets based on the modified Apriori algorithm under the association rule algorithm is used for the evaluation algorithm of this...
Article
Full-text available
Motif discovery is a fundamental operation in the analysis of time series data. Existing motif discovery algorithms that support Dynamic Time Warping require manual determination of the exact length of motifs. However, setting appropriate length for interesting motifs can be challenging and selecting inappropriate motif lengths may result in valuab...
Article
Full-text available
The dimension of relevant text feature space and feature weight of substation main equipment defect information is high, so it is difficult to accurately select mining features. The Natural Language Processing (NLP) medium and short‐term neural network model is used to realise the defect information text feature word segmentation in the log. After...
Article
Full-text available
Markov chains are a class of probabilistic models that have achieved widespread application in the quantitative sciences. This is in part due to their versatility, but is compounded by the ease with which they can be probed analytically. This tutorial provides an in-depth introduction to Markov chains and explores their connection to graphs and ran...
Article
Full-text available
INTRODUCTION: This project intends to study the mining method of FP-growth frequent items in 3Dmax big data under the Hadoop framework and combined with the Map Reduce development model. Firstly, the transaction database is selected according to the frequency of each transaction and the corresponding projection library is generated. Then the obtain...
Article
Full-text available
Purpose This paper studies the Digital Service Innovation (DSI) concept by systematically reviewing earlier studies from various scholarly communities. This study aims to recognize how recent advances in DSI literature from different research streams complement and can be incorporated into the growing digital servitization literature to define bett...
Article
Full-text available
INTRODUCTION: Gene expression data analysis is a critical aspect of disease prediction and classification, playing a pivotal role in the field of bioinformatics and biomedical research. High-dimensional gene expression datasets hold a wealth of information, but their effective utilization is hindered by the presence of irrelevant dimensions and noi...
Article
Full-text available
With the rapid progress in data mining, deep learning, and artificial intelligence, the demand for datacenters of various sizes increases globally. Datacenters typically require an environment with properly controlled temperature and humidity conditions for their proper operations. These needed environmental conditions are always provided by an air...
Article
Full-text available
The tourism information system in Nigeria is not novel. What is novel is the need to develop reliable real-time recommender systems that can adequately aid tourists in their decisions. Several researchers have proposed various models. However, there are still issues about the applicability, effectiveness, efficiency, and reliability of the existing...
Preprint
Full-text available
The use of big data analytics (including data mining and predictive analytics) by firms can be expected to increase productivity and reduce trade costs, which should be positively related to export activities. This paper uses firm level data from the Flash Eurobarometer 486 survey conducted in February – May 2020 to investigate the link between the...
Article
Full-text available
We introduce text mining to study work engagement by using this method to classify employees' survey-based self-narratives into high or low work engagement and analyzing the text features that contribute to the classification. We used two samples, representing the 2020 and 2021 waves of an annual survey among healthcare employees. In the first stud...
Article
Full-text available
Forschungsdaten werden zunehmend in digitalen Repositorien gespeichert und Dritten zur Nachnutzung zugänglich gemacht, teils beschränkt auf wissenschaftliche Anliegen. Es stellt sich die Frage, wie entsprechende Datensätze genutzt werden können, nachdem alle ethischen Dimensionen berücksichtigt sind. Dies wird anhand der exemplarischen Bearbeitung...
Article
Full-text available
Objective To examine the time trend of statistical inference, statistical reporting style of results, and effect measures from the abstracts of randomized controlled trials (RCTs). Study desgin and settings We downloaded 385,867 PubMed abstracts of RCTs from 1975 to 2021. We used text-mining to detect reporting of statistical inference (p-values,...
Article
Full-text available
Pengkodean penyakit yang lazim dilakukan oleh Rumah Sakit adalah menggunakan 2 metode, yang pertama adalah klinisi/ dokter menuliskan numenklatur penyakit berdasarkan kode ICD-10 dengan panduan kamus ICD-10 yang dapat berupa elektronik maupun buku. Metode kedua adalah klinisi/ dokter menulis secara free-text kemudian petugas koding dari rekam medis...
Article
Full-text available
Mining high utility itemsets are the basic task in the area of frequent itemset mining (FIM) that has various applications in diverse domains, including market basket analysis, web mining, cross-marketing, and e-commerce. In recent years, many efficient high utility itemsets mining (HUIM) algorithms are proposed to discover the high utility itemset...
Preprint
Full-text available
Background: Interrupted time series (ITS) studies contribute importantly to systematic reviews of population-level interventions. However, there is no search filter designed to identify only ITS studies from bibliographic databases. We aimed to develop and validate search filters to retrieve ITS studies in MEDLINE and PubMed. Methods: A set of 1,01...
Article
Full-text available
Around the world, road traffic accidents are the leading cause of serious injuries and deaths. Ethiopia is one of the countries that suffer the most from traffic accidents. Every government in every country wants to keep its citizens safe from accidents. To keep people safe from accidents, it is necessary to conduct a detailed analysis of the facto...
Article
Full-text available
Data mining methods have been proposed for finding hidden information in databases. When data is massive, dispersed, and heterogeneous, data mining and knowledge extraction become difficult. Classification is a common prediction task in data mining. Countless AI calculations have been proposed for the reason. Group learning consolidates numerous ba...
Article
Full-text available
The development of the Internet has accelerated the development of electronic commerce, which has led to changes in the management of supply chains and logistics. Unlike traditional shopping trips, there is a need for home deliveries and appropriate logistics systems for their implementation. To overcome new challenges and achieve process efficienc...
Article
Full-text available
Purpose: The main purpose of this study is to analyze the differences in research productivity between doctoral degree holders from European and North American universities, and doctoral degree holders from Peru. Theoretical framework: Internationalization of higher education has become a phenomenon of great relevance in recent years (Romani et al....
Preprint
Full-text available
The goal of Sentiment Analysis (SA), especially in social media, is to identify useful information in large amounts of unstructured data from many different sources. Text mining involves extracting revision texts and categorizing them by positive and negative attitudes. However, the value of the extracted features also lies in those that contribute...
Article
Full-text available
Online media reshaped the news industry leading to information richness, timely dissemination, and immense diversity. In addition, recent technological advancements enable on-spot, prompt and frequent reporting which can be viewed on smartphones, personal computers, and mobile devices. These recent developments enhanced the importance of news categ...
Article
Full-text available
Smart cities are one of the consequences of digital transformation, and there have been many attempts to assess the smartness of cities with various frameworks. Among these frameworks, smart city maturity models (SCMMs) evaluate the existing conditions of cities and provide guidelines for progressing through the subsequent stages of maturity. Howev...
Article
Full-text available
The air-conditioning (AC) energy use in express hotels is stochastic with the high coupling relationships amongst AC usage, indoor temperature and energy consumption. Such complexities and stochasticity make it hard to facilitate energy saving with clear effect on indoor environment. However, lacking analyses of high-resolution occupants’ energy us...
Experiment Findings
Full-text available
Google Scholar is one of the top search engines to access research articles across multiple disciplines for scholarly literature. Google scholar advance search option gives the privilege to extract articles based on phrases, publishers name, authors name, time duration etc. In this work, we collected Google Scholar data (2000-2021) for two differen...
Article
Full-text available
Apesar da questão da frugalidade estar presente no discurso acadêmico por mais tempo, o termo “inovação frugal” tem sido utilizado com maior frequência na última década. Conhecer as iniciativas é fundamental para amadurecer a visão de oportunidades, processos indutores e preparar o ecossistema de inovação para reconhecer as particularidades e desen...
Preprint
Full-text available
In this study, we employed our previously developed data mining method to show that a thermodynamic state shift occurred preceding the 2011 Mw 9 East Japan Earthquake (GEJE), coinciding with the onset of crustal stress manifestations. Our discussion starts with the insights obtained from our prior research, which revealed that small ground vibratio...
Article
Full-text available
The adverse effects of coal and gas energy production with the subsequent rapid increase in energy consumption emphasize the importance for Australia to adopt more renewable energy sources to counteract these dismissive contributions to climate change. This work presents a data mining approach for optimally selecting the best locations for installi...
Preprint
Full-text available
Subway construction is often in a complex natural and human-machine operating environment, and that complicated setting leads to subway construction more prone to safety accidents, which can cause substantial casualties and monetary losses. Thus, it is necessary to investigate the safety risks of subway construction. The existing literature on the...
Article
Full-text available
Background Systematic literature screening is a key component in systematic reviews. However, this approach is resource intensive as generally two persons independently of each other (double screening) screen a vast number of search results. To develop approaches for increasing efficiency, we tested the use of text mining to prioritize search resul...
Article
Full-text available
Wildfires are among the most threatening hazards to life, property, well-being, and the environment. Studying public opinions about wildfires can help monitor the perception of the impacted communities. Nevertheless, wildfire research is relatively limited compared to other climate-related hazards. This article presents our data mining work on publ...