Science topic

Data Mining - Science topic

Explore the latest questions and answers in Data Mining, and find Data Mining experts.
Questions related to Data Mining
  • asked a question related to Data Mining
Question
4 answers
Hi,
Thank you for help.
How to make the scheduling process in CloudSim an environment for my reinforcement learning model ?
Relevant answer
Answer
Thank you for sharing the links and papers, I will use them to learn.
I appreciate your time and efforts
Best Regards,
Bashar
  • asked a question related to Data Mining
Question
9 answers
I am looking for a justification to associate data mining with big data analytics, however, many researches have observed that in addition to the characteristics of the data, there is a line of thought that guides a question of taxonomy, that is, data mining is a step in the big data analytics, can I think of it this way? Or is there something I'm not considering?
Relevant answer
Answer
Big data analytics and data mining are not the same. Both of them involve the use of large data sets, handling the collection of the data or reporting of the data which is mostly used by businesses. However, both big data analytics and data mining are both used for two different operations. Big data analytics the process of analyzing larger data sets with the aim of uncovering useful information
  • asked a question related to Data Mining
Question
7 answers
Modern politics is characterized by many aspects which were not associated with traditional politics. Big data is one of them. Data mining is being done by political parties as they seek help from data scientists to arrive at various patterns to identify behavior of voters. Question is, what are the various ways in which big data is being used by modern political parties and leaders?
Relevant answer
Answer
Big Data platforms allow government agencies to access large volumes of information that are essential for their daily operations. With real-time access, governments can identify areas that require attention, make better and more timely judgments about how to proceed, and enact the necessary changes.
  • asked a question related to Data Mining
Question
9 answers
Data Mining and Machine Learning looks similar to me. Can you elaborate the difference between these two? As per my understanding-
Data Mining is about finding some useful information and using that information in decision making. That means using the known properties of data we are finding the unknown property of data. e.g. Studying sales of computers in different regions and supply them accordingly.
On the other hand ML is about prediction of results. It uses known properties of data to find other known property of data with new data instances. e.g. Prediction of house prize after 5 years from the existing data of house sales.
  • asked a question related to Data Mining
Question
3 answers
I require some suggestions and need a health insurance dataset where text mining can be possible.Any recent papers addressing dataset can be helpful
Relevant answer
Answer
Dear Anuradha,
Please check the following link:
  • asked a question related to Data Mining
Question
7 answers
I have a data set that contains a text field for approximately more than 3000 records, all of which contain notes from the doctor. I need to extract specific information from all of them, for example, the doctor's final decision and the classification of the patient, so what is the most appropriate way to analyze these texts? should I use information retrieval or information extraction, or the Q and A system will be fine
Relevant answer
Answer
DEAR Matiam Essa
This text mining technique focuses on identifying the extraction of entities, attributes, and their relationships from semi-structured or unstructured texts. Whatever information is extracted is then stored in a database for future access and retrieval.The famous technique are:
Information Extraction (IE)
Information Retrieval (IR)
Natural Language Processing
Clustering
Categorization
Visualization
With the increasing amount of text data, effective techniques need to be employed to examine the data and to extract relevant information from it. We have understood that various text mining techniques are used to decipher the interesting information efficiently from multiple sources of textual data and continually used to improve text mining process.
GOOD LUCK
  • asked a question related to Data Mining
Question
35 answers
The current technological revolution, known as Industry 4.0, is determined by the development of the following technologies of advanced information processing:
Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies.
Which of these technologies are applicable or will be used in the future in the education process?
Please reply
Best wishes
Relevant answer
Answer
In this increasingly flexible and complex environment, collaboration takes on a more valuable role in HE, creating hybrid communities, doing hybrid work, using hybrid practices. We propose that HE institutions should not focus on academic disciplines alone. In fact, working collaboratively between academics and professional members of staff enhances and positively impacts organisational culture. By professional members of staff also engaging in research and education within their own job roles, while collaborating with academics to enhance the student experience, a more consistent approach can be achieved. What is more, this way of working and being helps cohesiveness and collaboration while empowering individuals as important members of the team....
  • asked a question related to Data Mining
Question
7 answers
Which tools solves prediction problems effectively other than python based ?
Relevant answer
Answer
Totally we can't say which tool is the best one since it depends on data type and every person. Here you can find some of them:
I prefer Orange Data Mining. It is a FREE and opensource data visualization, machine learning, and data mining toolkit.
  • asked a question related to Data Mining
Question
3 answers
Problem statement: Google Trend Analysis and Paradigm Shift of Online Education
Platforms during the COVID-19 Pandemic
I would like to know what methodoligies, Data preprocessing techniques methods ,data mining methods,metrics used for this Analysis.
Relevant answer
Good morning
I invited you to see SCOPUS and Web of Sciences database
Best regards
Ph.D., MBA Ingrid del Valle García Carreno
  • asked a question related to Data Mining
Question
4 answers
How do data mining researchers test or evaluate their data mining model's EFFICIENCY?
or an ISO cert evaluation?
The model created is an output of the hypothesis and theory in my mind that I want to test so I unlikely want to use other people to evaluated the model like a system.
Since data mining evalation metrics alone can not be use to support the study.
I am searching for a study/research of way I can back up my study for the efficancy of the model created.
Feel free to educate me. I would love to hear your thoughts.
Relevant answer
Answer
  • asked a question related to Data Mining
Question
13 answers
Hi everybody,
I would like to do part of speech tagging in an unsupervised manner, what are the potential solutions?
Relevant answer
Answer
  • asked a question related to Data Mining
Question
3 answers
Please suggest R packages and codes for text ming (or any other programming) to search pubmed database.
Relevant answer
Answer
Ajit Kumar Singh Enter a free text search into the PubReMiner tool, and it will search PubMed for results. The program analyzes these data and generates tables that rank the frequency of terms in the articles' titles and abstracts, as well as related MeSH categories.
  • asked a question related to Data Mining
Question
2 answers
Dear Madam, Please advise about post Doc supervisors in the university in the field of educational data mining and learning analytics for strengthening university decision making. I will be grateful
  • asked a question related to Data Mining
Question
11 answers
how to measure classification errors using weka. can we take the value of RSME or etc to utilize for taken the classification rate?
Relevant answer
Answer
I have been teaching myself how to use RWeka, specifically so that I may implement the M5P model. I have been able to use apply to my data, but do not understand what the percentage represents. For example, the beginning of the sample output from RWeka's manual is:
M5 pruned model tree: (using smoothed linear models) CHMIN <= 7.5 : LM1 (165/12.903%)
The other LMs have other "scores" like this, like (6/18.551%) and (23/48.302%). What exactly do these percentages and numbers represent?
  • asked a question related to Data Mining
Question
13 answers
I'm searching about autoencoders and their application in machine learning issues. But I have a fundamental question.
As we all know, there are various types of autoencoders, such as ​Stack Autoencoder, Sparse Autoencoder, Denoising Autoencoder, Adversarial Autoencoder, Convolutional Autoencoder, Semi- Autoencoder, Dual Autoencoder, Contractive Autoencoder, and others that are better versions of what we had before. Autoencoder is also known to be used in Graph Networks (GN), Recommender Systems(RS), Natural Language Processing (NLP), and Machine Vision (CV). This is my main concern:
Because the input and structure of each of these machine learning problems are different, which version of Autoencoder is appropriate for which machine learning problem.
Relevant answer
Answer
Look the link, maybe useful.
Regards,
Shafagat
  • asked a question related to Data Mining
Question
1 answer
I am very new to these forecasting methods. Can someone help me with how to forecast the next period using these methods?
I have weekly demand data where I classified them into lumpy, erratic, and smooth demands. As Croston's forecasting method is the best suited for smooth and SBA method for lumpy, I require their forecasting process to plan the demand for the next weekly period.
Is there any other method to forecast lumpy and smooth demands other than this method?
Thank you in advance
  • asked a question related to Data Mining
Question
4 answers
I am looking a free of charge International Conference in metaheuristic algorithm or data mining issue, is there ay one can help me?
Relevant answer
Sikirat Aina thanks.
  • asked a question related to Data Mining
Question
5 answers
I have past 4 years of weekly demand data. There are various products with their demand values. I am trying to calculate the future weekly values for a year. The data doesn't follow any trend and it is random. There are many weeks with Zero demands too.
I am very new to time series analysis. Can someone help me in suggesting an appropriate method?
Relevant answer
Answer
1-st of all, the fundamental assumption of any forecasting technique (implicit or explicit) is that time series represents a stable pattern that can be identified and then extended into the future. If a pattern of the past data-points is not statistically stable or random (as is in your case), then no meaningful future prediction (forecasting) is possible regardless of the sophistication of the forecasting technique.
Because your time series data points are random with no trend, your best bet is generating other random points from your current data distribution and treat these new random points as your forecast. You could build a histogram of your existing data points and generate new random points from this histogram.
  • asked a question related to Data Mining
Question
7 answers
I have some Key Informant Interview (KII) data. I want to apply Natural Language Processing (NLP) to identify the pattern in the data. Can applying NLP for analyzing KII be mentioned as data analytics tools in the report/paper?TIA
Relevant answer
Answer
Of course, it is an interesting work. For example, (1) using NER (Named Entity Recognition), RE (Relation Extraction) to construct Knowledge Graph, then analyzing the relations between the interviewees or the knowledge constitution of an interviewee ; (2) using EE (Event Extraction) to identify the event correlation between the questions and answers; (3)using SA (Sentiment Analysis) to analyze the attitudes toward to the interviewer or the company, etc; (4) using topic models to analyze the topics about the interview and finding out which topic the interviewers are most interested in; etc.
Many,many interesting jobs you can do by using NLP analysis. Wish you finished an interesting paper in some days.
  • asked a question related to Data Mining
Question
4 answers
I have compiled a list of lecture note, examples, and notes from Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems). The attached pdf is the first iteration of the text at this point it is just a manuscript. I would appreciate feedback on how to organize and structure the text in a way that it could be presented to a publisher.
Relevant answer
Answer
Przemysław Dolata Thank you for your advice. There are many PhD programs available. I am currently applying to several. I would love to get advice on your strategy for reading and analyzing texts.
  • asked a question related to Data Mining
Question
9 answers
What will be the future applications of analytics of large data sets conducted in the computing cloud on computerized Business Intelligence analytical platforms in Big Data database systems in enterprise logistics management?
The analytics conducted on computerized Business Intelligence platforms is one of the key advanced information technology technologies of the fourth technological revolution, known as Industry 4.0. The current technological revolution described as Industry 4.0 is determined by the development of the following technologies of advanced information processing: Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies.
The analytics conducted on computerized Business Intelligence platforms currently supports business management processes, including logistics management.
In my opinion, the use of analytics of large data sets conducted in the computing cloud on computerized Business Intelligence analytical platforms in Big Data database systems in enterprise logistics management, including supply logistics, production logistics, provision of services and distribution of manufactured products and services, is currently growing.
The analytics conducted on large data sets conducted in the cloud computing on Business Intelligence computerized platforms in Big Data database systems makes it particularly easy to identify opportunities and threats to business development, allows for quick generation of analytical reports on selected issues in the economic and financial situation of the business entity. In this way, the generated reports can be helpful in the processes of enterprise logistics management, including supply logistics, production logistics, provision of services and distribution of manufactured products and services.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
What will be the future applications of analytics of large data sets conducted in the computing cloud on computerized Business Intelligence analytical platforms in Big Data database systems in enterprise logistics management?
Please reply
I invite you to the discussion
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
It is rising field since intelligence and in general artificial intelligence becomes the dominant technology of current era
  • asked a question related to Data Mining
Question
12 answers
i am doing project on automated classification of software requirement sing NLP and machine learning approach i.e. Naive Bayes. For this i require dataset of classified software requirements. i have searched PROMISE data repository, but didnot find dataset according to my need. can someone help me it will be highly appreciated if someone tell me from where i can find and download this dataset.
Relevant answer
Answer
The PROMISE dataset is here: https://doi.org/10.5281/zenodo.268542
The PURE dataset is here: https://doi.org/10.5281/zenodo.1414117
  • asked a question related to Data Mining
Question
6 answers
dear community, I need your help regarding extracting data from the Binance platform in order to use it for a forecasting problem , for example we extract data about a certain crypto then we clean it and make it ready for use and make a forecast if we should buy it or not with adding an alarm when the time is perfect for that, using python and machine learning and statistics.
  • asked a question related to Data Mining
Question
23 answers
Hi everyone
I'm looking for a quick and reliable way to estimate my missing climatological data. My data is daily and more than 40 years. These data include the minimum and maximum temperature, precipitation, sunshine hours, relative humidity and wind speed. My main problem is the sunshine hours data that has a lot of defects. These defects are diffuse in time series. Sometimes it encompasses several months and even a few years. The number of stations I work on is 18. Given the fact that my data is daily, the number of missing data is high. So I need to estimate missing data before starting work. Your comments and experiences can be very helpful.
Thank you so much for advising me.
Relevant answer
Answer
It is in French
  • asked a question related to Data Mining
Question
14 answers
Cluster analysis, classification, Data Mining
Relevant answer
Answer
Grouping related data according to categories or themes. These are based on inter-relations between the variables which can influence each other in the respective setting.
  • asked a question related to Data Mining
Question
3 answers
Hi Fellows,
The matrix is here at the bottom: https://statweb.stanford.edu/~jtaylo/courses/stats202/visualization.html. A similar version is seen on the book Introduction to Data Mining. It's clear that colours toward the red end indicate stronger correlation, but what attributes or variables are really correlated as shown? For example, along the main diagonal, cases of the same species show mostly perfect correlation, with a few near-perfect occurrences. Normally, a correlation is calculated with two columns of values, not two single cases.
Thanks
RP
  • asked a question related to Data Mining
Question
4 answers
I am working on a data mining project and would like to portray the correlation between healthcare expenditure by country and the population's life expectancy/general health and am having trouble finding sizeable data sets.
Relevant answer
Answer
Healthcare expenditures: http://wdi.worldbank.org/table/2.12
Here's the full list of indicators: http://wdi.worldbank.org/table
  • asked a question related to Data Mining
Question
4 answers
Hi
How can new data mining methods be used to assess the ecological potential of the land?
Relevant answer
Answer
Data mining holds great potential to improve health systems. It uses data and analytics to identify best practices that improve care and reduce costs. Researchers use data mining approaches like multi-dimensional databases, machine learning, soft computing, data visualization and statistics.
Nevertheless, It's easy to understand how coal mining is bad for the environment. Excavating large chunks of land, moving soil and rocks to rake resources from beneath the surface — of course that disrupts ecosystems and unearths harmful pollutants that contaminate the air and groundwater.
Kind Regards
Qamar Ul Islam
  • asked a question related to Data Mining
Question
10 answers
I am passionate for working on medical data. but unfortunately the disease on which I want to work, I couldn't find data in my home country. Anyone Up from medical informatics and health data mining who can collaborate with me?
Relevant answer
Answer
Please have look on our(Eminent Biosciences (EMBS)) collaborations.. and let me know if interested to associate with us
Our recent publications In collaborations with industries and academia in India and world wide.
EMBS publication In association with Universidad Tecnológica Metropolitana, Santiago, Chile. Publication Link: https://pubmed.ncbi.nlm.nih.gov/33397265/
EMBS publication In association with Moscow State University , Russia. Publication Link: https://pubmed.ncbi.nlm.nih.gov/32967475/
EMBS publication In association with Icahn Institute of Genomics and Multiscale Biology,, Mount Sinai Health System, Manhattan, NY, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
EMBS publication In association with University of Missouri, St. Louis, MO, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30457050
EMBS publication In association with Virginia Commonwealth University, Richmond, Virginia, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
EMBS publication In association with ICMR- NIN(National Institute of Nutrition), Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
EMBS publication In association with University of Minnesota Duluth, Duluth MN 55811 USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
EMBS publication In association with University of Yaounde I, PO Box 812, Yaoundé, Cameroon. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
EMBS publication In association with Federal University of Paraíba, João Pessoa, PB, Brazil. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30693065
Eminent Biosciences(EMBS) and University of Yaoundé I, Yaoundé, Cameroon. Publication Link: https://pubmed.ncbi.nlm.nih.gov/31210847/
Eminent Biosciences(EMBS) and University of the Basque Country UPV/EHU, 48080, Leioa, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852204
Eminent Biosciences(EMBS) and King Saud University, Riyadh, Saudi Arabia. Publication Link: http://www.eurekaselect.com/135585
Eminent Biosciences(EMBS) and NIPER , Hyderabad, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Eminent Biosciences(EMBS) and Alagappa University, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Eminent Biosciences(EMBS) and Jawaharlal Nehru Technological University, Hyderabad , India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Eminent Biosciences(EMBS) and C.S.I.R – CRISAT, Karaikudi, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237676
Eminent Biosciences(EMBS) and Karpagam academy of higher education, Eachinary, Coimbatore , Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Eminent Biosciences(EMBS) and Ballets Olaeta Kalea, 4, 48014 Bilbao, Bizkaia, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Eminent Biosciences(EMBS) and Hospital for Genetic Diseases, Osmania University, Hyderabad - 500 016, Telangana, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Eminent Biosciences(EMBS) and School of Ocean Science and Technology, Kerala University of Fisheries and Ocean Studies, Panangad-682 506, Cochin, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27964704
Eminent Biosciences(EMBS) and CODEWEL Nireekshana-ACET, Hyderabad, Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26770024
Eminent Biosciences(EMBS) and Bharathiyar University, Coimbatore-641046, Tamilnadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27919211
Eminent Biosciences(EMBS) and LPU University, Phagwara, Punjab, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/31030499
Eminent Biosciences(EMBS) and Department of Bioinformatics, Kerala University, Kerala. Publication Link: http://www.eurekaselect.com/135585
Eminent Biosciences(EMBS) and Gandhi Medical College and Osmania Medical College, Hyderabad 500 038, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27450915
Eminent Biosciences(EMBS) and National College (Affiliated to Bharathidasan University), Tiruchirapalli, 620 001 Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27266485
Eminent Biosciences(EMBS) and University of Calicut - 673635, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Eminent Biosciences(EMBS) and NIPER, Hyderabad, India. ) Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Eminent Biosciences(EMBS) and King George's Medical University, (Erstwhile C.S.M. Medical University), Lucknow-226 003, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579575
Eminent Biosciences(EMBS) and School of Chemical & Biotechnology, SASTRA University, Thanjavur, India Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579569
Eminent Biosciences(EMBS) and Safi center for scientific research, Malappuram, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Eminent Biosciences(EMBS) and Dept of Genetics, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25248957
EMBS publication In association with Institute of Genetics and Hospital for Genetic Diseases, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26229292
Sincerely,
Dr. Anuraj Nayarisseri
Principal Scientist & Director,
Eminent Biosciences.
Mob :+91 97522 95342
  • asked a question related to Data Mining
Question
11 answers
Hi There!
My data has a number of features (with contain continuous data) and a response feature (class label) of categorical data (binary). My intention is to study the variation of the response feature (Class ) due to all the other features using a variety of feature selection techniques. Kindly help in pointing out right techniques for the purpose. Data is like this:
------------------------------------------------------------------
f1 f2 f3 f4 ... fn class
------------------------------------------------------------------
0.2 0.3 0.87 0.6 ... 0.7 0
0.2 0.3 0.87 0.6 ... 0.7 1
0.2 0.3 0.87 0.6 ... 0.7 0
0.2 0.3 0.87 0.6 ... 0.7 1
-------------------------------------------------------------------
Relevant answer
Answer
You can select the best algorithm based on the measure of performance from a number of data mining algorithms. A exhaustive list may be found: https://www.kdnuggets.com/2015/05/top-10-data-mining-algorithms-explained.html
  • asked a question related to Data Mining
Question
4 answers
I think that Generative Adversarial Networks can be used as Data Farming Means. What do you know about such an approach? Can you give another example of means for Data Farming?
Relevant answer
Answer
Other approaches exist. For instance (and mostly application based)
Extreme Data Mining.
A strategy to apply machine learning to small datasets in materials science.
Machine learning on small size samples: A synthetic knowledge synthesis.
  • asked a question related to Data Mining
Question
4 answers
Why Particle Swarm Optimization works better for this classification problem?
Can anyone give me any strong reasons behind it?
Thanks in advance.
Relevant answer
Answer
Arash Mazidi PSO is also in various classification problems. I particularly use it for Phishing website datasets.
  • asked a question related to Data Mining
Question
4 answers
How many respondents are really enough?
There are two schools of thought about sample size a relatively small sample size is adequate. Perhaps 300-500 respondents can work?
Relevant answer
Answer
What is the best number of respondents when conducting a research?
There are two schools of thought about sample size – one is that as long as a survey is representative, a relatively small sample size is adequate. Perhaps 300-500 respondentscan work. The other point of view is that while maintaining a representative sample is essential, the more respondents you have the better.
Regards,
Shafagat
  • asked a question related to Data Mining
Question
7 answers
Please share the paper and throw the light on text mining and meta analysis
Relevant answer
Answer
Here's a link of my meta-analysis paper
  • asked a question related to Data Mining
Question
9 answers
Let consider there is a selling factor like this:
Gender | Age | Street | Item 1 | Count 1 | Item 2 | Count 2 | ... | Item N | Count N | Total Price (Label)
Male | 22 | S1 | Milk | 2 | Bread | 5 | ... | - | - | 10 $
Female | 10 | S2 | Cofee | 1 | - | - | ... | - | - | 1 $
....
We want to predict the total price for a factor based on their buyer demographic information (like gender, age, job) and also their buying items and counts. It should be mentioned that we suppose that we don't know each item's price and also, the prices will be changed during the time (so, we although will have a date in our dataset).
Now it is the main question that how we can use this dataset that contains some transactional data (items) which their combination is not important. For example, if somebody buys item1 and item2, it is equal to other guys who buy item2 and item1. So, the values of our items columns should not have any differences for their value orders.
This dataset contains both multivariate and transactional data. My question is how can we predict the label more accurately?
Relevant answer
Answer
Hi Dr Behzad Soleimani Neysiani . I agree with Dr Qamar Ul Islam .
  • asked a question related to Data Mining
Question
4 answers
For example, k-nearest neighbor needs to compute the smallest one of distances between a query and a large number of data.
But, k-means clustering computes the smallest one of distances between each data and a few cluster center.
Like k-nearest neighbor, which technique requires to compute the maximum or minimum value in a large number of data?
Relevant answer
Answer
I recommend reading the following paper as it contains useful information to answer yoru question:
  • asked a question related to Data Mining
Question
3 answers
I want to understand C5.0 algorithm for data classification , is there any one have the steps for it or the original paper that this is algorithm was presented in ?
Relevant answer
  • asked a question related to Data Mining
Question
6 answers
What is the best algorithm to complement a cluster analysis (k-means) and define the ideal cluster number? I am testing the Weka data mining application, which incorporates clustering algorithms that do not require prior selection of the number of clusters. Has anyone tried it?
  • asked a question related to Data Mining
Question
12 answers
Hello everybody
I am solving a Social Network Analysis problem. I have 9 centrality measures in my problem and I am trying to combine them for creating a new centrality measure.
I have chosen TOPSIS as a combining method. Now I am looking for an easy method to assign appropriate weights to my criteria.
If you think you can help me and even introduce me to a better solution than TOPSIS, I will be glad if you share it with me.
Best Regards
Relevant answer
Answer
I suggest using entropy derived weights that are objective
  • asked a question related to Data Mining
Question
5 answers
I have seen City Pulse (see link) and they have the type of data I'm looking for, but not in large enough quantity. In the best case, the data will have recording intervals that are < 1 hour (the more frequent, the better) and have total duration of at least a month. 
Relevant answer
Answer
i need a monthly dataset for water usage...
  • asked a question related to Data Mining
Question
4 answers
I would like to carry out a study (Social-Economical Categorization) on multi datasets (text data from ISPs, hospitals, Government records agencies ) using any suitable data mining technique. I read that WEKA can do the job. I am still a newbie when it comes to data mining analysis and WEKA. Kindly advise on how best I can do this.
  • asked a question related to Data Mining
Question
4 answers
what procedure and data should I use ?
how to structure the empirical study ?
Relevant answer
Answer
You may find this paper useful:
Stagnaro, M. N., Arechar, A. A., & Rand, D. G. (2017). From good institutions to generous citizens: Top-down incentives to cooperate promote subsequent prosociality but not norm enforcement. Cognition, 167, 212–254.
  • asked a question related to Data Mining
Question
7 answers
I would like to dive into the research domain of explainable AI. What are some of the recent trending methodologies in this domain? What can be a good start to dive into this field?
Relevant answer
Answer
Go to Google Scholar and type in some papers you know about, for example Shapley Values:
Review the papers that cite the "The many Shapley values for model explanation" paper.
  • asked a question related to Data Mining
Question
3 answers
I usually use Latent Dirichlet Allocation to cluster texts. What do you use? Can someone give a comparison between different text clustering algorithms?
Relevant answer
Answer
I typically have used k-means clustering algorithm which is very popular. This algorithm is based on partitioning. Similarly you can use clustering algorithms based on density or hierarchical clustering methods.
  • asked a question related to Data Mining
Question
4 answers
Can you suggest any topic related to Big Data + Data Mining + Association Rule Mining + Predicting Consumer Behaviors
Relevant answer
Answer
I found several hits when I type this into search engines, but likely you found not all the keywords were found simultaneously?
I could suggest one article to consider if only for your literature review as it covers a lot,
Strang, K. D., & Sun, Z. (2017). Scholarly big data body of knowledge: What is the status of privacy and security? Annals of Data Science, 4(1), 1-17. http://dx.doi.org/10.1007/s40745-40016-40096-40746.
  • asked a question related to Data Mining
Question
9 answers
What are the various query based (Top-K Frequent Pattern Mining) techniques are being used for various purposes. So i need to know what are some new research trends in Data Mining.
Relevant answer
Answer
Causal inference will be the next frontier (4-10 years ) in AI, machine learning and modeling.
  • asked a question related to Data Mining
Question
8 answers
I'm looking for finding frequent itemsets in sequences, which means the order of appearance of items matters in itemsets. Consider the following example :
1,2,3
1,3,2
3,1,2
Assume that the order of items matters, then if we put min_support = 3, {1,2} is frequent, because support({1,2})= 3 and every time we see {1,2} in this dataset, 2 comes after 1.
Let's consider {1,3}, we know that this itemset appears 3 times in our dataset, but is not frequent, because only in 2 transactions 3 comes after 1.
I'm looking for an algorithm that can do this for me, I found algorithms like GSP which do something similar to what i want, but they don't do exactly what i wanted to do. Can you please recommend me an algorithm which is able to find such frequent itemsets?
Thanks in advance
Relevant answer
Answer
Philippe Fournier Viger I have read your surveys when i was working on my thesis, it helped me a lot and guided me to complete my research. By the way SPMF is amazing! I have seen it and i worked with it in the period of my research.
I think the problem that i have described is a little bit different than known sequential pattern mining algorithms. That's why i decided to ask it here.
Thanks in advance for your answer.
  • asked a question related to Data Mining
Question
11 answers
Data Mining (DM) is a process of extracting and discovering patterns in large data sets including methods of Machine Learning (including Deep Learning and Statistical Learning), Statistics, and Database Systems.
Machine Learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data.
It would seem very simplistic to consider the ML only as a part of the larger field of the DM.
From a very rough and general point of view, DM and ML are part of the mathematics.
From another point of view, more precise but more obsolete, they are both seen as a part of Artificial Intelligence.
I would like to propose to consider both disciplines as overlapping for most of their methods.
Do you have at least 3 differences between DM and ML to report?
Relevant answer
Answer
As you say, many of the methods and technologies are identical. Three differences:
1. Data mining seeks patterns in existing and historic data, whereas ML tries to predict outcomes from future data.
2. DM has more human interaction in terms of domain expertise, with human judgement used for interventions and decision making. ML can often be applied with very limited domain expertise.
3. DM has knowledge discovery as a core objective (the aim is to learn more about the domain under investigation), whereas ML is about making good predictions (even if based on black-box methods that tell us nothing about why they are good predictions).
There is, of course, a spectrum with many studies being both ML and DM to some extent.
  • asked a question related to Data Mining
Question
5 answers
Can anyone help me find a tool that allows me to download the old tweets in the history of a user. I need to study the content of the tweets of 2011 from a group of users who used a # hashtag.
Relevant answer
Answer
Hi Oscar,
You can use Trackmyhashtag to download old Tweets. It is a Twitter analytics tool that can track any hashtags or topic and provides useful detailed analytics in real-time as well as historical.
It can help you to download old Tweets of any user since 2006. You will get complete tweet detail including total tweets, tweet content, contributors detail who contributed to that specific tweet, and other lots of useful metrics.
Here you will find a "Request data" form which you have to fill with your search term and select the dates and submit. After a while, you'll receive your Tweets data and other details with it.
Thank you:)
  • asked a question related to Data Mining
Question
9 answers
Is there a Python or R package for analyzing spreader nodes and community detection in the multilayer network?
Relevant answer
Answer
I worked on this specific topic in my dissertation, I tried many tools but eventually, I and my research team decided to design a simulator for static, dynamic, and multilayer networks using NetLogo multiagent programming. Below is the link to the simulator, it might be useful in your work.
Note that you just need to configure the simulator based on your needs.
Good Luck
  • asked a question related to Data Mining
Question
3 answers
Hello,
Does anyone know how to extract all twitter images under a specific hashtag using python or R? Any relevant packages?
Thank you,
Ioanna
PS I am not searching how to extract all images uploaded by a user.
Relevant answer
Answer
You can use the "Tweepy library" to extract Twitter data.
To learn how to use Tweepy library in python, check out the following link:
  • asked a question related to Data Mining
Question
3 answers
I'd like to use it in a classification task.
Relevant answer
  • asked a question related to Data Mining
Question
3 answers
NFL theorem is valid for algorithms training in fixed training set. However, the general characteristic of algorithms in expanded or open dataset has not been proved yet. Could you show your opinions about this or suggest some related papers?
Relevant answer
Answer
I think it is very complicated to determine and draw such general statements over all algorithms in general. I would suggest that even with the changes to the data characteristic, things such as concept drifts still have an impact and that therefore properties might be subject to change in the future. Since your application domain of ML models is mostly to predict this future, it should be taken into account that theoretical this infinite space will not be available since the future and its implication on algorithmic properties remain hidden.
Some further readings and discussions on the NFL with reagrd to ML:
  • asked a question related to Data Mining
Question
16 answers
My master's research is in info retrieval and text mining. I would be grateful if you could help me to select a good topic for my phd research proposal.
Relevant answer
Answer
There is a growing interest in applyling machine learning to physical sciences, e.g., data-driven turbuelcen closure modeling, sparse-based discovering of governing equations, deep learning-based predictive modeling of chaos.
  • asked a question related to Data Mining
Question
3 answers
I am trying to classify and analyze the results of an SDS-PAGE based array for bacterial detection using machine learning, but I have trouble finding the best way to represent the results with proper features. 5 bacteria were tested in 43 experiments, for each we have 5 gel columns based on time. I would be grateful if fellow researchers could guide me as to how I should represent the data effectively.
Relevant answer
Answer
Hello,
You can be inspired by this study :
  • asked a question related to Data Mining
Question
9 answers
I am looking for a free software for text mining and sentiment analysis for my research on customer review mining (it involves calculating polarity of attributes,opinion oriented information extraction etc)
can somebody suggest if this can be done through NVIVO,is it free ?
also if you have any other suggession
Relevant answer
Answer
Agree, for example on https://realpython.com/sentiment-analysis-python/ a nice overview of all steps you have to explore. Success!
  • asked a question related to Data Mining
Question
4 answers
I am trying to apply meachine learning approaches such as: support vector machines, random forests, artificial neural networks on a microRNA dataset (NGS) and also on a mRNA dataset (NGS).I have used sequencing data for training and to make classifications based on some features and then tested on an independent data set to see the prediction accuracy.
To calculate support vector machines (SVM) I used the R e1071 package but I am looking for other tools.
Any body can share/suggest an R/ Python code/Package?
Relevant answer
Answer
LIBSVM -- A Library for Support Vector Machines https://www.csie.ntu.edu.tw/~cjlin/libsvm/
  • asked a question related to Data Mining
Question
12 answers
Hi
I am trying to segment a sentinel2 image.
At this stage, I want to run a binary classifier that assigns each pixel to either farm or non-farm pixel. For this purpose, I have 4 10m bands including R/G/B/NIR. I also have generated an NDVI raster for each month (8 months in total) that has values ranging from -1 to 1 (it can be normalized to 0 to 255).
I am looking for a classifier that can accurately classify the pixels using NDVI and/or any combination of my 4 10m bands.
Thanks in advance.
Relevant answer
Answer
Convolutional Neural Networks (CNNs) is the most popular neural network model being used for image classification problem. The big idea behind CNNs is that a local understanding of an image is good enough
Top 5 Classification Algorithms in Machine Learning
  • Logistic Regression.
  • Naive Bayes Classifier.
  • K-Nearest Neighbors.
  • Decision Tree. Random Forest.
  • Support Vector Machines.
  • asked a question related to Data Mining
Question
8 answers
Need data to demonstrate various applications of data mining.
Relevant answer
Answer
I used kaddle.com, it is useful.
  • asked a question related to Data Mining
Question
2 answers
I have a problem in accessing materials, especially in terms of biclustering (application of the biclustering algorithm in the manufacturing system).
  • asked a question related to Data Mining
Question
5 answers
1. Can I use ORANGE towards text mining for qualitative research publication? For interview responses?
2. Is it an acceptable methodology?
3. Can you pls refer me to any already published reputed material who used Orange?
Relevant answer
Answer
Dear Quazy, 1) You can use the software that best fit to your requirements. Maybe, it could be Orange, R, IBM Statistics, SAS, Phyton libraries, etc. Even, you could use statistical approaches also, such as the correspondence analysis; 2) a software is not a methodology...take a look of traditional approaches before make your choice (e.g., crispdm, semma, kdd process, etc.); 3)https://doi.org/10.1007/978-3-540-30116-5_58
I hope it is useful for you!
Regards,
  • asked a question related to Data Mining
Question
7 answers
I am involved in a project that adds value in visualizing misclassification in the text mining domains. I am wondering whether anyone has experience in formally proofing that the visualisations are in fact aiding the overall data science project outcome.
Relevant answer
Answer
Visualization involves many aspects. Scientific aspect is mostly manifested as interface that can augment, by strengthening, human-data interaction. In scientific field visualization has a wide sense, which is lesser dependent on perception. For instance, sonification and haptification are usually applicable with a similar success. In the art, visualization is mostly manifested and intended to augment human-to-human mental communication. However, this aspect is lesser developed and has not yet enough tools to uncover opportunities and prospects of the novel method or the concept. So far, it is highly dependent on perception and primitive tools available. Mental visualization, in a wide sense, is the next step forward to augment the way of human thinking in both human-data interaction or human-human communication or perception the world.
  • asked a question related to Data Mining
Question
22 answers
Hello. Do you think learning a programming language is necessary for data mining? or is learning "RapidMiner" enough? I am pursuing my master's degree in business administration and have recently become interested in learning data mining.
Relevant answer
Answer
Seyyed Masih Rajaei Almousavi Rapidminer is a great platform and it might solve most of your needs however to become an effective data miner is quite important to know the following programming languages: SQL, Python, R and/or SAS (Statistical Analysis System). Python and R will be also useful later on AI solutions. Which path is better or more effective? This is up to your IT background and your motivation.
  • asked a question related to Data Mining
Question
3 answers
Related to data mining/data science, I recently discovered Orange (https://orange.biolab.si/) a high level very powerful free toolkit. Using it is very simple and you can test different models in order to evaluate the best results. However, when writing paper to a Journal, is it plausible to use it instead of R Studio (for instance) or other tools (PyTorch, Weka, TensorFlow)? Or Orange is best indicated just for teaching?
Relevant answer
Answer
Dear Alessandro,
The terms Machine learning, data mining and data science are used interchangeable but there is difference between these terms. Basically all the terms revolve around dealing with the rapidly using large amount of data. Data science derives understanding from structed and unstructured data. It is used for qualitative analysis. It consist of data visualizations, data mining, language processing etc. It is a wider area of research which makes use of many algorithms and operations to derive informative insights from both structured and unstructured information. Where as data mining analyzes data sets created from structed data to find the hidden correlations and patterns. It is subset of data science used to extract data and generate prediction models. Data Mining also incorporates data cleaning, pattern prediction, statistical analysis, data conversion, machine learning, and data visualization. Both play key roles in helping organizations recognize opportunities and arrive at worthwhile decisions/conclusions so that they can take easily decisions to grow the businesses. Additionally, the knowledge needed for procedures in both of these fields also varies. Hence, the analysis of the differences in their approach, tools used and steps applied. Data science tools include Python, Apache spark, SAS, Tableau, R, Tensor flow etc. Data mining tools used as weka, rapid Miner, KNime, Oracle Data Mining, Apache Mahout, Teradata, Orange. These tools are not only used to teach/research but also used by industries to do the analysis as per requirements.
  • asked a question related to Data Mining
Question
4 answers
When Bagging is based on randomly picking samples and classifying them, I wonder whether this special occasion is possible:
The selected sample is still not separable?
If it is possible, is there any solution to this?
Thank you so much.
Relevant answer
Answer
It depends essentially on the ratio of the size n of the samples to the number N of features. If n < N+1, the examples of any sample will be linearly separable; otherwise, the probability that the examples of a sample are linearly separable depends on the n/N ratio. This is known as Cover's theorem. If you want to know more about this theorem, which is fundamental in machine learning, be aware that the "Cover's theorem" page of Wikipedia is completely misleading. It contains a sentence that is presented as a quote from Cover's paper; actually the sentence is neither present in that paper, nor in any other publication by Thomas Cover. Unfortunately, this sentence has been pasted into hundreds of Web pages and "tutorials", which shows that many machine learning "experts" do not check their sources. Look at the real paper: T. M. Cover, "Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition," in IEEE Transactions on Electronic Computers, vol. EC-14, no. 3, pp. 326-334, June 1965, doi: 10.1109/PGEC.1965.264137.
  • asked a question related to Data Mining
Question
4 answers
Hücresel imalat sistemlerinde veri madenciliği tekniklerinden kümeleme yöntemlerinin kullanılması konusundan yardımcı olabilir misiniz?
Relevant answer
Answer
I think this topic might help you
Good Luck!
  • asked a question related to Data Mining
Question
5 answers
There are six levels in Bloom's Taxonomy.I can identify the levels by action verbs in a sentence .If  there is no action verb in a sentence ,how can I do that ?
Relevant answer
Answer
Fortunately, there are “verb tables” to help identify which action verbs align with each level in Bloom’s Taxonomy.
You may notice that some of these verbs on the table are associated with multiple Bloom’s Taxonomy levels. These “multilevel-verbs” are actions that could apply to different activities. For example, you could have an objective that states “At the end of this lesson, students will be able to explain the difference between H2O and OH-.” This would be an understanding level objective. However, if you wanted the students to be able to “…explain the shift in the chemical structure of water throughout its various phases.” This would be an analyzing level verb.
Adding to this confusion, you can locate Bloom’s verb charts that will list verbs at levels different from what we list below. Just keep in mind that it is the skill, action, or activity you will teach using that verb that determines Bloom’s Taxonomy level.
If the response is helpful then click on Recommend to help others. Thank you.
  • asked a question related to Data Mining
Question
24 answers
I wonder if anyone can thankfully provide me with a free software to analyze data mining and/or artificial intelligence.
Relevant answer
Answer
Good day. Rapid miner, Weka are some of the software available as open source or free software, once again it all depends upon what data and what we are going to analyze with it, if we can go for the open source or we build our own tool.
  • asked a question related to Data Mining
Question
8 answers
I have an input data set as a 5x100 matrix. 5 indicates the number of variables and 100 indicates the number of samples. I also have an target data set as a 1x100 matrix, which is continuous numbers. I want to design a model using input and target data set using a deep learning method. How can I enter my data (input and target) in this toolbox? Is it similar to the neural fitting ( nftool) toolbox?
Relevant answer
Answer
Deep Learning Toolbox™ provides a framework for designing and implementing deep neural networks with algorithms, pre-trained models, and apps. You can build network architectures with the Deep Network Designer app, you can design, analyze, and train networks graphically. The Experiment Manager app helps you manage multiple deep learning experiments, keep track of training parameters, analyze results, and compare code from different experiments. You can visualize layer activations and graphically monitor training progress.
You can take help from the internet and follow the link for making his own architecture accordingly to your input.
  • asked a question related to Data Mining
Question
3 answers
I am looking for software and/or papers on automatic data insight generation. I think it is more nuanced than AutoML --- or I misunderstood it as an umbrella term.
By data insights, I loosely mean any result that is possibly interesting; examples include category outliers; correlation; anomalies, overall trends, and seasonality in time series; associations; predictability; and so on. I am interested in how these are mined automatically:
  1. Given the combinatorial explosion in the number of possible insights to be generated, how do we choose where to concentrate our efforts on?
  2. How do we evaluate the "interestingness" (or "confidence") of the data insights we have found?
The only software I know that generates such insights automatically is Microsoft Power BI[0]. Narrative Science is another, but it seems much more advanced --- generating natural language reports using ontologies --- than what I am interested in.
Relevant answer
Answer
Investigating Insight Generation and Decision Making with Visualizations in Real and Virtual Environments
  • Conference: the Technology, Mind, and Society
  • 📷Devin Michael Gill
  • 📷Ian T. Ruginski
  • Joshua Butner
  • 📷Sarah H Creem-Regehr
  • asked a question related to Data Mining
Question
1 answer
I want to make an adjacency matrix with citations.
I want to make an index of 130 words and search 130 papers against the 130 words. Manually this is a long process. But I want to automate the searching.
Can anyone suggest if this can be done with text mining or any other ways?
Relevant answer
Answer
Yes, it is the classical task of NLP
  • asked a question related to Data Mining
Question
9 answers
I created my own huge dataset from different sites and labeled it on some NLP task. How can i publish it in form of Paper or article and where?
Relevant answer
Answer
I think "Data in Brief" can be a good place to publish your own created datasets.
  • asked a question related to Data Mining
Question
12 answers
Hello
I have an income dataset and house sales dataset with a different number of records, I need to integrate these two together using R. I thought I need to create a new column in the income dataset that has the available options for each record based on the monthly income of employees and household income per month( I thought if the price equals or lower then it is an option). But I do not know if it is considered as integration or not? or if it is a good solution, I would appreciate it if someone can explain it to me.
Relevant answer
Answer
Yes,
So, in this case, Shiekhah A. al Binali have two datasets, income and house sales,
i think its possible to combine if there have a key to combine and related
  • asked a question related to Data Mining
Question
12 answers
Please suggest full form for SOM in reference to data mining, machine learning and data science except Self organising maps. i have used the abbreviation SOM in my topic of research that stands for self organising map.But now as my work contain many other data mining algorithms also, so i need a broad perspective. Moreover I cannot change title of research.
Relevant answer
Answer
You should use it fully spelled out the first time used followed by the abbreviation in parentheses. I would use it sporadically in the full form and any time it appears as the first word in a sentence.
  • asked a question related to Data Mining
Question
6 answers
Data mining is a process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems.
Relevant answer
Answer
I agree with Marcos W. Rodrigues . If you want specific list:
  1. Financial analysis
  2. Telecommunication industry
  3. Intrusion detection task
  4. Retail industry analysis
  5. Higher education mining
  6. Energy industry
  7. Spatial data mining
  8. Biological data analysis
  9. Healthcare data analysis
  10. Manufacturing engineering
  11. Pattern mining
  12. Lie detection
  13. Research analysis and more
I think, as long as you have dataset, you can perform several aspects of data mining. However, the gist is KDD .