Science topics: Computer Science and EngineeringData Mining
Science topic
Data Mining - Science topic
Explore the latest publications in Data Mining, and find Data Mining experts.
Publications related to Data Mining (10,000)
Sorted by most recent
This intermediate-level tutorial, titled "Gen-RecSys," merges both industrial and academic perspectives on recent advances in Generative AI for recommender systems (beyond LLMs). It aims to highlight the transformative role of generative models in modern recommender systems, which have significantly impacted the AI field-particularly with the rise...
How does technological interdependence affect innovation? We address this question by examining the influence of neighbors' innovativeness and the structure of the innovators' network on a sector's capacity to develop new technologies. We study these two dimensions of technological interdependence by applying novel methods of text mining and networ...
This study expands upon previous research on family firm leadership by exploring the role of CEO identity-i.e., family vs. nonfamily CEO-concerning the way media perceive the brand of the family firm-i.e., brand importance. Drawing on endorsement theory, we suggest that CEO identity influences media perception of the family firm and its brand, ther...
RESUMEN La educación en Guatemala enfrenta desafíos profundos, incluyendo desigualdad en el acceso, infraestructura deficiente, baja calidad educativa, altas tasas de deserción escolar en los niveles de primaria, básicos y diversificado, y una formación docente insuficiente. Estas problemáticas son especialmente acentuadas en las comunidades rurale...
Análisis de los conjuntos de datos Agraviados y Sindicados de Guatemala en los años 2020 al 2023 Publicados por el Ministerio Público mediante el Instituto Nacional de Estadística (INE)
Resumen
Este artículo explora los conjuntos de datos de agraviados y sindicados reportado por el Ministerio Público (MP) en Guatemala durante los años 2020 al 20...
Clustering is an important data mining and descriptive task. It has been researched deeply by various researchers for diverse application areas and is applied in multiple working domains such as data classification and image processing. An algorithm for clustering, Improved Balanced Iterative Reducing and Clustering using Hierarchies (I-BIRCH) is p...
Artificial intelligence (AI) and the Internet of Things (IoT) are increasingly being developed and applied alongside each other. The combination of these two technologies helps to create intelligent and efficient systems that can act automatically and make better decisions, in fact, the combination of artificial intelligence and the Internet of Thi...
Resumen OBJETIVO: Demostrar causas que provoquen decesos en neonatos y en regiones de Guatemala MÉTODO: Se eligió la data desde el portal del Instituto Nacional de Estadística de los años 2013 al 2022, en el contexto de decesos y defunciones neonatales. Con herramientas de minería de datos como R Studio y Google Colab, se investigan causas que prov...
El presente artículo presenta la aplicación de algoritmos de minería de datos, específicamente árboles de decisión y bosques aleatorios, para analizar patrones y realizar predicciones en procesos de adjudicación pública en Guatemala. Los modelos permiten identificar relaciones significativas entre variables como el monto, la modalidad y las categor...
Magnetic resonance imaging Metaplasia Vagina a b s t r a c t Many tissues undergo metaplasia in response to chronic inflammation, infection, and environmental factors like chemicals and viruses. This metaplastic change can be in the form of gland formation (adenosis). Adenosis of the vaginal mucosa is rare and has been recorded in the past secondar...
With the number of cyber-attacks growing rapidly and the opportunities to attack companies widening, vulnerability management is gaining increased importance. It needs to reduce the variety of possibilities by remediating vulnerabilities found in IT infrastructures. Process mining is an established method used to discover, analyze, and manage data...
Fault detection in marine diesel engine lubrication systems is crucial for ensuring the long-term stable operation of diesel engines and the safety of maritime navigation. Traditional fixed-parameter alarm threshold methods lack flexibility and are prone to missing faults. Data-driven approaches like machine learning require high-quality data for f...
Resumen OBJETIVO: Evidenciar las enfermedades que sufren en Guatemala las distintas etnias tratadas a través de la medicina interna en los hospitales de la ciudad de Guatemala. MÉTODO: Se seleccionó la data de Medicina Interna procesada por el Instituto Nacional de Estadística (INE), correspondiente a los años 2018 al 2022, conteniendo estadísticas...
Multi-view clustering, which improves clustering performance by exploring complementarity and consistency among multiple distinct feature sets, is attracting more and more researchers due to its wide applications in various fields e.g., pattern recognition and data mining. Traditional approaches usually explore above characteristics by mapping diff...
This paper examines the use of natural language processing (NLP) in data mining for acquiring knowledge from Internet of Things (IoT)-based digital content, emphasizing text categorization's role in information retrieval and knowledge density evaluation. The study identifies biases in tokenization, classification, text tagging, and summarization, n...
Quality in meteorological data is one of the main issues for many real applications including weather forecasting and for developing irrigation models. The integrity of meteorological data may be compromised for several reasons including the presence of corrupted and missing data which can be added due to interference and equipment malfunctioning....
High-sensitivity C-reactive protein (hs-CRP) is a biomarker of inflammation predicting the incidence of different health pathologies. In this study, we aimed to evaluate the association between hematological and demographic factors with hs-CRP levels using decision tree (DT) and linear regression (LR) modeling. This study was conducted on a populat...
The exponential growth of Big Data has revolutionized numerous industries by enabling the extraction of valuable insights from vast and diverse datasets. However, this advancement is accompanied by significant privacy and security challenges that impede the full potential of data analytics. Privacy-Preserving Data Mining (PPDM) emerges as a critica...
This paper studies whether the risk factors disclosed in financial reports are as informative as SEC requires. The subprime crisis is used as a typical real-risk event to test their informativeness by text mining methods. By analyzing the textual attributes and specific contents of the risk disclosures in 14089 Form 10-K statements of 1,685 financi...
Data mining is vital for smart grids because it enhances overall grid efficiency, enabling the analysis of large volumes of data, the optimization of energy distribution, the identification of patterns, and demand forecasting. Several performance metrics, such as the MAPE and RMSE, have been created to assess these forecasts. This paper presents ne...
Análisis de Factores Socioeconómicos y Culturales en la Formación y Disolución de Matrimonios en Guatemala: Un Enfoque de Minería de Datos Resumen Este estudio explora la relación entre factores socioeconómicos y culturales en la dinámica de matrimonios y divorcios en Guatemala mediante técnicas avanzadas de minería de datos, como algoritmos A prio...
Polyamorphism, the existence of multiple amorphous states in a single material, has been observed in the glass-forming system GeO2. This study investigates the intermediate range order, two-state model and polyamorphism in GeO2 system using molecular dynamics (MD) simulation and MD data mining analytics. Analysis of the Ge-Ge distance distribution...
Accurate and fast disturbance localisation is critical for taking timely controls to prevent power system instability. With the increased complexity of systems, the physical model‐based disturbance localisation is challenging to achieve good performance due to model deficiency. Phasor measurement unit (PMU)‐based approaches are developed but their...
Purpose
Sodium-glucose cotransporter 2 (SGLT2) inhibitors have been reported to exhibit antiarrhythmic effects. However, there is conflicting evidence regarding the association between SGLT2 inhibitors and ventricular arrhythmias or sudden cardiac death (SCD). We utilized the US FDA Adverse Event Reporting System (FAERS) database to investigate the...
Design of dashboards is strategic for the Customer Relationship Management (CRM) associated with Health and Wellness Tourism, considering all the knowledge acquired about Health and Wellness Tourism, the Machine Learning algorithms identified, and the characteristics to be considered in the development of the reports. It is intended to identify the...
High-pressure die casting (HPDC) of aluminum alloys is one of the most efficient manufacturing methods, offering high repeatability and the ability to produce highly complex castings. The cast parts are characterized by good surface quality, high dimensional accuracy, and high tensile strength. Continuous technological advancements are driving the...
Research conducted by Fama and French in 1996 showed that there were factors other than the beta that were significantly able to predict stock returns. In other words, the Fama and French Three Factors Model (TFMFF) is better than the Capital Asset Pricing Model (CAPM). However, several subsequent studies showed inconsistent results. The discrepanc...
Clustering plays a crucial role in data mining and pattern recognition, but the interpretation of clustering results is often challenging. Existing interpretation methods usually lack an intuitive and accurate description of irregular shapes and high dimensional datas. This paper proposes a novel clustering explanation method based on a Multi-Hyper...
The purpose of this study is to analyze, from a data mining perspective, the information from the Integrated Report (IR) in some Brazilian Accounting Units (UPCs) using the Orange Data
Mining (ODM) tool. To this end, a qualitative, documentary and exploratory study was carried out through textual analysis practices of financial and non-financial da...
Penjualan kue merupakan salah satu usaha yang terus berkembang, di mana pemahaman terhadap pola pembelian konsumen dapat memberikan keuntungan kompetitif yang signifikan. Dalam penelitian ini, diterapkan metode data mining dengan tujuan untuk menyusun aturan asosiasi menggunakan algoritma Apriori guna meningkatkan penjualan kue. Data yang digunakan...
Abstrak-- Industri makanan dan minuman di Indonesia menunjukkan pertumbuhan signi ikan, menciptakan persaingan ketat di pasar. Untuk meningkatkan daya saing, analisis data pola pembelian konsumen menjadi solusi strategis. Penelitian ini mengaplikasikan algoritma Apriori pada 3.500 data transaksi penjualan kue untuk menggali pola pembelian dan hubun...
This study establishes a deep learning model for personalized travel recommendations based on factors that affect tourists’ purchases to provide users with more accurate and personalized travel recommendations. Firstly, Natural Language Processing (NLP) technology is used to process and emotionally analyze tourism review information, dividing it in...
Penelitian ini bertujuan untuk mengidentifikasi pola pembelian konsumen dalam transaksi penjualan produk cake menggunakan algoritma Apriori sebagai metode data mining. Dengan menganalisis 3.500 transaksi dari periode Januari 2012 hingga April 2021, penelitian ini menemukan pola signifikan dalam hubungan antar produk. Tahapan penelitian meliputi pre...
Abstrak Algoritma Apriori merupakan salah satu metode dalam data science yang digunakan untuk menemukan pola asosiasi dari dataset. Dalam penelitian ini, algoritma Apriori diterapkan pada dataset "Cake.xls" yang berisi informasi terkait transaksi penjualan setiap kue. Tujuan utama penelitian ini adalah untuk mengidentifikasi pola asosiasi antar kue...
Neutrosophic theory and its applications have been expanding in all directions at an astonishing rate especially after of the introduction the journal entitled “Neutrosophic Sets and Systems”. New theories, techniques, algorithms have been rapidly developed. One of the most striking trends in the neutrosophic theory is the hybridization of neutroso...
There are now ever-increasing amounts of digital data stored daily. Processing such big data requires new sciences such as data mining. Data mining aims to find the information and knowledge hidden in these massive amounts of data. Data mining has various branches and applications, one of which is clustering. In clustering, data are classified into...
Accurately extracting biological echoes is a fundamental prerequisite for weather radar aeroecology monitoring. However, the concurrent presence of meteorological echoes and biological echoes greatly restricts the extraction accuracy. Traditional neural network-based echo extraction algorithms rely on the spatial continuity feature of the echoes. B...
Los avances en las tecnologías de inteligencia artificial (IA) han impulsado el desarrollo de la IA generativa, capaz de producir una amplia gama de contenidos. No obstante, el entrenamiento de estos sistemas plantea importantes interrogantes sobre la salvaguardia y el ejercicio de los derechos de autor de los contenidos utilizados para estos fines...
The High Utility Itemset mining (HUIM) is an important research area in the field of data mining and knowledge discovery. HUIM aims to discover the high utility patterns from a given database, based on a utility threshold value, where the utility is a user-defined objective function. The existing HUIM algorithms fail to consider the actual behaviou...
: Resolving material and processing issues is essential to improving for-mulations and attaining accurate color matching in blending. Two transparent poly-carbonate resins made up the blend: PC1, which made up 33% of the mixture with a Melt Flow Index (MFI) of 25 g/min, and PC2, which made up 67% of the mixture with an MFI of 65 g/min. This study e...
Attribute reduction is a significant challenge in fields like data mining and pattern recognition. Various models have been introduced to enhance the performance of attribute reduction algorithms, such as the fuzzy rough sets model. However, the common greedy-based reduction algorithm frameworks shared by these models often struggle to efficiently...
Outlier detection is a crucial research problem in data mining, aiming to identify data objects that significantly deviate from the distribution of other data. To solve the issues of low-density patterns and low local density problems in nearest neighbor-based outlier detection methods, this paper proposes an outlier detection algorithm based on th...
Online reviews are effective information-sharing tools due to their word-of-mouth characteristics. The extant literature has considered reviews as independent variables that influence business performance, while the environmental factors shaping these reviews remain under-explored. We examine the impact of COVID-19-related environmental uncertainti...
Traffic forecasting, a core technology within Intelligent Transportation Systems, holds broad application prospects due to its ability to accurately predict future traffic states through the modeling and analysis of complex spatio-temporal traffic data. Nevertheless, due to the complex temporal and spatial heterogeneity of traffic sequences, existi...
p>Machine learning algorithms have been widely applied in the field of personalized learning within educational information technology. By leveraging big data analysis and data mining techniques, machine learning can help identify patterns and trends in students' learning behaviors, preferences, and performance. This information can then be used to...
Concept impulsion is a vital problem for any information investigation situation including transiently requested information. In predictive analytics and machine learning, the concept impulsion means that the measurable properties of the objective variable, which the model is endeavoring to anticipate, change after some time in unanticipated ways....
The dynamic and complex nature of the construction industry leads to increased project uncertainty, exposing construction projects to various risks and hazards. Poor risk management can hinder project objectives. Therefore, implementing effective risk management strategies can enhance project quality, safety, and ensure on-time, under-budget comple...
The COVID-19 pandemic significantly impacted higher education, forcing a rapid transition from face-to-face modality to the emerging online learning modality. This transition exposed several challenges, including, inequitable access to technology, global health and economic crisis, and specific difficulties faced by both teachers and students in Ec...
Distance metric learning (DML) aims to find an appropriate way to reveal the underlying data relationship. It is critical in many machine learning, pattern recognition and data mining algorithms, and usually require large amount of label information (such as class labels or pair/triplet constraints) to achieve satisfactory performance. However, the...
The unstability of global food system have aroused growing attention worldwide. Althrough there is sufficient food produced to feed every person, many people in the world are suffering from hunger. Moreover, the current food system is harmful to the environment. As our global population continues to rise, the ability to produce more food while sust...
The pursuit of effective models with high detection accuracy has sparked great interest in anomaly detection of internet traffic. The issue still lies in creating a trustworthy and effective anomaly detection system that can handle massive data volumes and patterns that change in real-time. The detection techniques used, especially the feature sele...
Perkembang Teknologi saat ini, mengumpulkan informasi penawaran yang dapat digunakan untuk memutuskan desain pembelian klien, penyimpangan penawaran, dan lebih lanjut mengembangkan strategi promosi. Transaksi pengobatan adalah salah satu area bisnis yang sangat terkena dampak kemajuan inovasi data. Penjualan obat dapat memperoleh manfaat luar biasa...
Finding sarcastic statements has recently drawn a lot of curiosity in social media, mainly because sarcastic tweets may include favorable phrases that fill in unattractive or undesirable attributes. As the internet becomes increasingly ingrained in our daily lives, many multimedia information is being produced online. Much of the information record...
Time series analysis is a fundamental data mining task that supervised training methods based on empirical risk minimization have proven their effectiveness on specific tasks and datasets. However, the acquisition of well-annotated data is costly and a large amount of unlabeled series data is under-utilized. Due to distributional shifts across vari...
Projections of the demographic indicators at the national and state level have been made for many years but for decentralized planning and better monitoring of already existing policies projections are needed for smaller subsections of the population. In this paper, an attempt has been made to present a simple nonparametric technique to extrapolate...
A chest X-ray can convey a lot about a patient's condition. However, it requires a specialized and skilled doctor to determine the type of lung disease with high accuracy. Here comes the role of deep learning techniques (DL) and artificial intelligence (AI) in accelerating the process of detecting lung diseases and classifying them with high precis...
The “ego network of words” model captures structural properties in language production associated with cognitive constraints. While previous research focused on the layer-based structure and its semantic properties, this article argues that an essential element, the concept of an
active network
, is missing. The
active
part of the ego network o...
In the current era, the relentless advancement of information technology necessitates efficient information acquisition, which relies on proper data processing. To address the challenges in data organization, data mining emerges as a pivotal solution. This study aims to delve into various methodologies for data grouping. Employing a survey approach...
Can simulating the public transit experience of individuals with mild visual impairments promote empathy in design college students, thereby facilitating their design decisions? The purpose of this study is to explore the impact of a mixed empathy intervention (role-playing and experiential prototyping) on improving design students’ empathic abilit...
Entdecken Sie die aufregende Seite der Archäologie und Geschichte – jenseits der akademischen Langeweile! In diesem fesselnden, neu überarbeiteten zweiten Band werden Sie in die faszinierenden Geschichten und Legenden eintauchen, die auf realen Fakten basieren. Erleben Sie die legendäre Schatzsuche auf Oak Island mit, wo seit 200 Jahren unzählige A...
This study presents a novel multimodal medical image zero-shot segmentation algorithm named the text-visual-prompt segment anything model (TV-SAM) without any manual annotations. The TV-SAM incorporates and integrates the large language model GPT-4, the vision language model GLIP, and the SAM to autonomously generate descriptive text prompts and vi...
The rapid growth of big data analytics has heightened concerns about data privacy, necessitating the development of advanced privacy-preserving techniques. This research addresses the challenge of optimizing privacy-preserving data mining (PPDM) for big data analytics through the innovative application of deep reinforcement learning (DRL). We propo...
Ocean temperature prediction is significant in climate change research and marine ecosystem management. However, relevant statistical and physical methods focus on assuming relationships between variables and simulating complex physical processes of ocean temperature changes, facing challenges such as high data dependence and insufficient processin...
The use of data mining to address the issue of people who consume tobacco and other harmful substances for their health has led to a significant dependence among smokers, which over time causes illnesses that may result in the addict's death. As a result, the research's goal is to apply a data mining study whose findings showed that the confidence...
Background: The penetration of drugs through the blood–brain barrier is one of the key pharmacokinetic aspects of centrally acting active substances and other drugs in terms of the occurrence of side effects on the central nervous system. In our research, several regression models were constructed in order to observe the connections between the act...
Celem artykułu jest krytyczna refleksja nad relacją pomiędzy jakościową analizą tematyczną i modelowaniem tematycznym (ang. topic modeling), jedną z bardziej popularnych odmian automatycznego przetwarzania tekstu. Na podstawie wyników jakościowej i ilościowej analizy dokumentów Konferencji Episkopatu Polski autorzy pokazują wady i zalety modelowani...
Suitable nutritional diets have been widely recognized as important measures to prevent and control non-communicable diseases (NCDs). However, there is little research on nutritional ingredients in food now, which are beneficial to the rehabilitation of NCDs. In this project, we profoundly analyzed the relationship between nutritional ingredients a...
Clustering Organizing items into groups based on their properties such that the items in the same group are similar and those in other groups are distinct is known as clustering and is one method of unsupervised learning. The primary benefit of clustering is that, with little or no prior information, fascinating patterns and structures can be disco...
Remote health care is needed when direct medical monitoring is unavailable. Modern technology provides several ways to make patient access simpler. In particular, cloud, IoT, and data mining technologies have been successful in health care and medicine. This method identifies illnesses using data collected from two groups of patients, a male group...
This study explores community perspectives on Yogyakarta, a culturally rich region in Indonesia known as "Jogja Istimewa," "Student City," and "City of Tourism." Given the potential challenges faced by the region, the research employs the K-Means Algorithm to analyze opinions gathered from Twitter, offering a novel alternative to traditional survey...
Unsupervised feature selection (UFS) has gained increasing attention and research interest in various domains, such as machine learning and data mining. Recently, numerous matrix factorization-based methods have been widely adopted for UFS. However, the following issues still exist. First, most methods based on matrix factorization use the squared...