Science topic

Data Mining - Science topic

Explore the latest questions and answers in Data Mining, and find Data Mining experts.
Questions related to Data Mining
  • asked a question related to Data Mining
Question
2 answers
Hello there, I am in the search for datasets of software's requirements and their use cases, in hope to be able to gather datasets of use case for the requirements to train a ML model for a research we're working on. Would anyone know any source to find such datasets ?
Relevant answer
Answer
Najib Abusalbi I did yes, I searched in datasets websites like hugging face and kaggle, google datasets, searched on Google search engine and Google Scholars, and across journals and many websites, I didn't manage to find any public repository except the one made by the National council of Italy, other than that, did not find datasets, even searching in published papers and articles, no one mention from where they got their datasets or where it can be available, a few who do that sadly.
  • asked a question related to Data Mining
Question
3 answers
I want to analyze the research problem in education data mining, with machine learning algorithms, I want to build a model that suggest school students which domain to select for higher education, with evaluating the dataset of student as well as the dataset of higher education of the same student.
Relevant answer
Answer
Impact of social media on students in post covid period.
Impact of mobile usage on students in higher education post covid period
  • asked a question related to Data Mining
Question
9 answers
What is spatial and temporal mining in data mining and what are spatial data structures in data mining?
Relevant answer
Answer
Dr Mohammad Imam thank you for your contribution to the discussion
  • asked a question related to Data Mining
Question
20 answers
I am interested to get depper to the connection between data analysis methods and information visualization that can be generated by this data analysis. For example, data clustering (in data mining) produces a certain kind of information. Which visualization method could be used to best visualize the produced information and why?
I have found this http://www.visual-literacy.org/periodic_table/periodic_table.html which very good on depicting the different visualization methods but lacks explaining to what data analysis method each one of them it is connected.
Any recommended good source?
Thanks
Relevant answer
Answer
Just a quick answer to data visualization. I do highly recommemd to learn Python and use matplotlib to visualize data.  There are already existing many libraries focussing on data mining, AI, and machine learning.
There are available many courses online, books, and python manuals.
Learning Python to work with data is really worth it. For starters, the best is to find a YouTube video on the topic you want to solve or some close one.
References:
[1] Joakim Sundnes: "Introduction to Scientific Programming with Python", Springer, Simula SpringerBriefs on Computing (2010) ISBN 978-3-030-50355-0 (Open Access https://doi.org/10.1007/978-3-030-50356-7
[2] H.P. Langtangen: A Primer on Scientific Programming with Python, Texts in Computational Science and Engineering 6, Springer, DOI 10.1007/978-3-662-49887-3
  • asked a question related to Data Mining
Question
3 answers
Please suggest new research topic new computer science in data mining using machine learning
Relevant answer
Answer
Dear Pushpraj Singh,
I give my proposal for a research topic, research thesis, thesis concept in the area of your interest:
Research Context: In recent years, the scale of various economic, financial, social, health, food, energy, nature, climate, etc. crises is increasing. As a result, the importance of improving crisis management techniques and using new ICT information technologies and Industry 4.0 for this purpose is growing. The importance of improving risk management processes using new Industry 4.0 technologies, including but not limited to i.e. Big Data Analytics and Artificial Intelligence, is also growing.
Accordingly, the research topic may address the following issue: The application of selected ICT information technologies, Industry 4.0, the technologies of the current fourth technological revolution, including Big Data Analytics, machine learning, deep learning, artificial intelligence to improve risk management systems, early warning systems within the framework of crisis management, and the improvement of forecasting models used to predict abnormal situations, events of special risk increases, emergencies, specific types of disasters, etc.
I would like to invite you to join me for scientific cooperation on this issue,
Kind regards,
Dariusz Prokopowicz
  • asked a question related to Data Mining
Question
2 answers
Compared to the old-fashioned and currently used emulsion type explosives, the explosive filling of the tunnel face with bulk charging provides better and higher quality vibration values. if you are drilling in the tunnel face with the Mwd (measurement while drilling) featured jumbo. Because with the mwd-capable machine, heterogeneous drilling is performed in the formation whose face surface is uneven and the drilling lengths are different. Therefore, a homogeneous charge in a heterogeneous face with an emulsion-type explosive of constant kilogram will be difficult. Therefore, I think that more stable vibration data will be obtained with bulk charging. What is your opinion?
Relevant answer
Answer
I obtained an empirical formula with 95% accuracy rate with emulsion type explosive. Thank you very much for your esteemed reply. I think I can get more accurate results with bulk charging. Thank you very much for your interest, Mr. Signh.
  • asked a question related to Data Mining
Question
10 answers
I created my own huge dataset from different sites and labeled it on some NLP task. How can i publish it in form of Paper or article and where?
Relevant answer
Answer
Publishing your own created labeled corpus can be done through various avenues depending on your goals and the field you're working in. If you wish to contribute to the academic community and share your research findings, publishing it in the form of an article or paper in relevant journals or conference proceedings would be appropriate. This allows you to provide a detailed description of your corpus creation process, its applications, and potential insights derived from it. Alternatively, you could explore open-access platforms or repositories specific to linguistic resources, such as the Linguistic Data Consortium (LDC), where researchers can deposit and share their corpora. Additionally, if your corpus is of significant value and relevance, you may consider reaching out to organizations or institutions involved in language processing or research, as they may be interested in hosting and making it accessible to others in the field.
  • asked a question related to Data Mining
Question
5 answers
Hello everyone,
I want to find emerging pattern of blockchain applications in cybersecurity . I’ve collected and filtered my dataset which now consists of 1183 research items indexed in WoS and scopus. Which text mining algorithms can fulfill the purpose?
I found burst detection and LDA suitable but as a tourism student i want to know about other possibilities and the suggestions of professionals.
Best wishes.
Relevant answer
Answer
One text mining algorithm that can fulfill the purpose of identifying emerging patterns of blockchain applications in cybersecurity from your dataset of 1183 research items indexed in WoS and Scopus is topic modeling using Latent Dirichlet Allocation (LDA). LDA is a probabilistic model that can discover hidden topics within a collection of documents by assigning probability distributions to words and topics. By applying LDA to your dataset, you can uncover the underlying themes and topics related to blockchain applications in cybersecurity. This algorithm can help identify patterns, common trends, and relationships among the research items, enabling you to gain insights into the emerging patterns in this domain.
  • asked a question related to Data Mining
Question
4 answers
Hello everyone, I’m currently working on my masters thesis in which I want to find current and future application patterns of a technology in an industry based on previous researchers done regarding the topic by analyzing the tittle, abstract, conclusion and implications of these article if it is even possible but I’m not sure which data mining method and algorithm should I use to get the best possible results. It would be great if you could give me advices and feedbacks.
Best regards.
Relevant answer
Answer
Choosing the right data mining method and algorithm depends on your use case. There are many different data mining methods and algorithms available, each with its own strengths and weaknesses. Some of the most popular data mining methods include clustering, classification, regression, and association rule mining. To determine which method is best for your use case, you should consider factors such as the size of your dataset, the type of data you are working with, and the specific problem you are trying to solve.
  • asked a question related to Data Mining
Question
7 answers
..
Relevant answer
Answer
Data Processing and Data Mining are both essential components of the data analysis process, but they have distinct purposes and methods. Here's a breakdown of the key differences between the two:
Data Processing: Data processing refers to the manipulation and transformation of raw data into a more meaningful and organized format. It involves various operations that cleanse, validate, integrate, and format data to make it suitable for further analysis. The primary goal of data processing is to ensure data quality, consistency, and reliability. It typically includes tasks such as data cleaning, data transformation, data aggregation, and data summarization. Data processing focuses on preparing data for efficient storage, retrieval, and analysis.
Data Mining: Data mining, on the other hand, is a specific technique or process within data analysis that involves discovering patterns, relationships, and insights from a large volume of data. It employs statistical and mathematical algorithms, machine learning techniques, and data visualization tools to extract knowledge and actionable information from the data. Data mining aims to uncover hidden patterns, trends, correlations, or anomalies that are not readily apparent. It can be used to solve specific business problems, predict future outcomes, identify market trends, or support decision-making processes.
In summary, data processing is the broader concept that encompasses the overall handling and preparation of data, ensuring its quality and consistency. Data mining, on the other hand, is a focused analysis technique that aims to extract valuable insights and knowledge from processed data by applying various statistical and machine-learning algorithms.
  • asked a question related to Data Mining
Question
1 answer
Resea
Relevant answer
Answer
Dear Nimota Jabaar Biobaku,
attached is a short bibliography where you can find some information about the relationship between Data Mining and SDN.
Best regards and much success
Anatol Badach
Kyriakos Sideris, Reza Nejabati, Dimitra Simeonidou: „Seer: Empowering Software Defined Networking with Data Analytics“; 15th International Conference on Ubiquitous Computing and Communications and 2016 International Symposium on Cyberspace and Security (IUCC-CSS), Dec 2016
Albert Mestres et al.: Knowledge-Defined Networking; ACM SIGCOMM Computer Communication Review, Vol. 47 Issue 3, Jul 2017
Haojun Huang et al.: Data-Driven Information Plane in Software-Defined Networking; IEEE Communications Magazine, Vol. 55, Issue 6, Jun 2017
Tam Nguyen: “The Challenges in SDN/ML Based Network Security: A Survey”; arXiv:1804.03539v2 [cs.CR], Apr 2018
Juliana Arevalo Herrera1, Jorge E. Camargo: A Survey on Machine Learning Applications for Software Defined Network Security; International Conference on Applied Cryptography and Network Security (ACNS), Aug 2019
Yuhong Li, Xiang Su, Aaron Yi Ding et al.: „Enhancing the Internet of Things with Knowledge-Driven Software-Defined Networking Technology: Future Perspectives”; Sensors (MDPI), Vol. 20, Jun 2020
  • asked a question related to Data Mining
Question
4 answers
My team and I are trying to open a dialogue about designing a Continuum of Realism for synthetic data. We want to develop a meaningful way to talk about data in terms of the degree of realism that is necessary for a particular task. We feel the way to do this is by defining a continuum that shows that as data becomes more realistic, the analytic value increases, but so does the cost and risk of disclosure. Everyone seems to be interested in generating the most realistic data, but let's be honest, sometimes that's not the level of realism that we actually need. It is expensive and carries a high reidentification risk when working with PII. Sometimes we just need data to test our code, and we can't justify using this level of realism when the risk is so high. Have you also encountered this issue? Are you interested in helping us fulfill our mission? Ultimately we are trying to save money and protect consumer privacy. We would love to hear your thoughts!
Relevant answer
Answer
Yes, there is a continuum of realism for synthetic data. At one end of the continuum, we have completely synthetic data that is generated based on mathematical models or simulations. This type of data can be useful for testing hypotheses, exploring different scenarios, and evaluating methods without the constraints and biases of real-world data. However, it may not reflect the complexity and diversity of real-world data, and may not be useful for certain applications, such as training machine learning models.
At the other end of the continuum, we have real-world data that is collected directly from sources such as surveys, medical records, or social media platforms. This type of data can provide a rich and diverse representation of the phenomena of interest but may be limited by factors such as sample size, data quality, and ethical considerations.
Between these two extremes, we have various levels of realism that can be achieved through the use of synthetic data. For example, data may be generated based on real-world data using methods such as data augmentation or data synthesis, which can create new data points that are similar to the real data but with some degree of randomness or variability. Alternatively, data may be generated based on simulations or generative models that incorporate known properties of the real-world data, such as distributional properties or relationships between variables.
As for your second question, as an AI language model, I am always ready to provide help and guidance on topics related to synthetic data and statistics. Please let me know if there is anything specific that I can assist you with.
  • asked a question related to Data Mining
Question
4 answers
It will be for a data mining research that the objective is to classify the best time of day for the operation of the wind farm.
Relevant answer
Answer
Laura Peçanha There are a number of wind datasets that include the factors you mentioned. The National Renewable Energy Laboratory (NREL), which maintains a comprehensive database of wind resource data for the United States, is one such source. The NREL wind resource database contains observations of wind speed, direction, and temperature at various heights above ground, as well as air density and turbulence strength. The data is delivered hourly and covers a variety of time periods based on the region.
The European Centre for Medium-Range Weather Forecasts (ECMWF) is another viable source of wind database, as it offers worldwide atmospheric reanalysis data that includes wind speed, direction, and temperature. The ECMWF data is accessible at several temporal resolutions, including hourly, and may be downloaded.
Other institutions and commercial enterprises that provide wind database services include AWS Truepower and Vaisala. These firms offer high-quality wind data and analytic tools that may be customized to meet unique research requirements.
In conclusion, various wind datasets are available that cover the variables you need for your research. Exploring numerous sources and evaluating data quality and relevance to your unique study objectives may be beneficial.
  • asked a question related to Data Mining
Question
3 answers
I would need a (tabular, i.e. not imaging or text) dataset with a hierarchically structured outcome to use as an example dataset in a new R package (but the dataset can be of any format, e.g. txt, csv or arff). It should be single-label and tree-structured, e.g. first level: classes 1, ..., 4, second level: 1.1, 1.2, 1.3, 2.1,2.2, third level: 1.1.1, 1.1.2, 1.2.1, 1.2.2, 1.2.3, 1.3.1, 1.3.2., ... .
Relevant answer
Answer
The labeling scheme you want to use is also popular when it comes to indexing semistructured documents (such as XML-documents), e.g. there is a labeling schema called ORDPATH:
With this schema, you could take any real-world collection of XML-documents and turn it into a dataset consisting of labels.
  • asked a question related to Data Mining
Question
3 answers
I want to develop a research about higher school dropout and would like some help on this topic.
Relevant answer
Answer
Yes, there are several approaches to addressing the issue of high school dropout rates beyond data mining, here are a few:
Implement targeted interventions: work with schools and communities to identify students at risk of dropping out and provide targeted interventions such as mentoring, tutoring, and after-school programs to keep them engaged and help them succeed.
Address underlying social determinants of academic success: Identify and address non-academic factors that contribute to high dropout rates, such as poverty, lack of access to healthcare, housing instability, and discrimination.
Provide alternative pathways to success: Support alternative routes to obtaining a high school diploma, such as vocational training, apprenticeships, and alternative learning programs like online or blended learning.
Foster a positive school culture: prioritize creating a positive school culture that values academic success, supports student engagement and wellbeing, and provides a safe and inclusive learning environment.
These approaches can all work together to tackle the complex and multifaceted issue of high school dropout rates. It is important to consider each in the context of the specific challenges and opportunities of the community being served.
  • asked a question related to Data Mining
Question
1 answer
yes. for further details contact now
Relevant answer
Answer
Yes. I suggest doing a search of ResearchGate using the terms "r package topic model" and following up on the top articles on topic modeling in R. There are other packages you can find by browsing the CRAN archives of R packages, but these articles are a good place to start.
I also recommend the book Text Analysis with R for Students of Literature by Matthew Jockers and Rosamond Thalken.
  • asked a question related to Data Mining
Question
2 answers
I am writing PhD thesis on data mining. How I can write a good "thesis Innovations"? What are the key points?
Relevant answer
Answer
Ajit Singh Thanks for your valuable comment
  • asked a question related to Data Mining
Question
4 answers
The data that is obtained from the institution database is to analyze the GPA and CGPA of 1000 students. The attributes obtained are demographic but no behavioral, income, etc. What type of data mining technique can be used to analyze this type of attributes and obtain patterns from the analysis?
Please do give reference in regards to how the techniques can be applied.
Thank you! Appreciate it.
Relevant answer
Answer
One educational data mining technique that can be used to analyze students' performance attributes via patterns is called "cluster analysis".
Cluster analysis is a statistical technique that involves grouping similar observations or data points together based on their attributes or characteristics. In the context of education, cluster analysis can be used to identify patterns in students' performance attributes, such as grades, test scores, attendance records, or behavior.
For example, a school may collect data on students' performance attributes over a period of time and use cluster analysis to group students who exhibit similar patterns of behavior or academic performance. This can help identify groups of students who may require additional support or resources to succeed, as well as inform instructional strategies and curriculum development.
Another educational data mining technique that can be used to analyze students' performance attributes via patterns is "association rule mining". Association rule mining involves identifying patterns and relationships between variables in large datasets. In the context of education, association rule mining can be used to identify correlations between students' performance attributes and other factors such as demographic information, socioeconomic status, or extracurricular activities. This can help schools and educators better understand the factors that influence student performance and make informed decisions about how to support students in their learning.
  • asked a question related to Data Mining
Question
4 answers
Hello dear researchers,
I've just accepted in doctoral program with data approximately consisted of thousands observations. I am planning on data mining first to explore, classify, associate, and detecting anomaly. I used to work with Stata and wondering if stata can do such things. Do you have any suggestions about reference that connecting Stata and data mining?
Relevant answer
Answer
Dear university staff!
I inform you that my lecture on electronic medicine on the topic: "The use of automated system-cognitive analysis for the classification of human organ tumors" can be downloaded from the site: https://www.patreon.com/user?u =87599532
Lecture with sound in English. You can download it and listen to it at your convenience.
Sincerely,
Vladimir Ryabtsev, Doctor of Technical Science, Professor Information Technologies.
  • asked a question related to Data Mining
Question
7 answers
Hi,
Most of the researchers knew R Views website which is:
Please, I am wondering if this website contains all R packages available for researchers.
Thanks & Best wishes
Osman
Relevant answer
Answer
no need to buy R
  • asked a question related to Data Mining
Question
4 answers
I am completely new to WEKA and I am trying to load this file that I got from kaggle to WEKA but is meet with error. How do I find the solution to change the format of .crv to ARFF file.
this is where I got the file, and I have cleaned the extra columns
Thank you very much.
Relevant answer
Answer
In ur file some data types may be mismatched. check date and name. In Name some bad characters
  • asked a question related to Data Mining
Question
3 answers
my topic is the " fraud detection in banking sector by using data mining techniques " so i am looking for the data set in banking and how t use that data set.
Relevant answer
Answer
A machine learning dataset is a collection of data that is used to train the model. A dataset acts as an example to teach the machine learning algorithm how to make predictions. ... The common types of data include:
  1. Text data.
  2. Image data.
  3. Audio data.
  4. Video data.
  5. Numeric data.
  • asked a question related to Data Mining
Question
48 answers
The current technological revolution, known as Industry 4.0, is determined by the development of the following technologies of advanced information processing:
Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies.
Which of these technologies are applicable or will be used in the future in the education process?
Please reply
Best wishes
Relevant answer
Answer
Let’s be clear: the metaverse (however you define it) is decades away.
Which is not to say that it can be ignored in the meantime. Because while it may seem like science fiction or over-inflated hype at the moment, the fact remains that huge amounts of money and effort are being poured into making it happen – and educators need to at least be aware of its possible implications...
  • asked a question related to Data Mining
Question
6 answers
Hi everyone,
I am facing this problem in my MA thesis:
I have two time series datasets. The first dataset has numerical features and the second one has binary variables. I found in the literature these three methods that are able to determine the correlation between the two datasets:
- logistic regression
-biserial point correlation
- Kruskal Wallis H test
Unfortunately, I could not find out if these methods are still applicable when the data are time series? I would appreciate some advice/explanations to figure this out :)
another question would be if I can use one of these methods, are there any limitations if my continuous variable is nonlinear?
Thanks in advance for your help :)
PS
both my datasets are stationary
# Data mining #correlation #time series analysis
Relevant answer
Answer
i have done the Dickey-Fuller test to check the stationarity of my features in the dataset. the result of statistic value was less than the critical value at 1%; p-value<<< 0,5% which leads to reject null hypothesis.As ive understood from the literature Rejecting the null hypothesis means that the process has no unit root, and in turn that the time series is stationary. Do you know way to test the linearity of the features than just doing a linear regression and checking R2?
  • asked a question related to Data Mining
Question
5 answers
What do you consider are the implications of Big Data on urban planning practice?
Relevant answer
Glory be to Allah... As time progresses, new developments appear that help people to complete their needs with flexibility and ease.
  • asked a question related to Data Mining
Question
4 answers
Good evening dear researchers,
I have a data set from KEGG database. it is in CSV format. I was trying to convert it into arff format using WEKA for further analysis.
It keeps giving me an error saying that it is not recognized by WEKA as a csv file. I searched for it, then I found that the file needs to be cleansed and put into a suitable data structure for it to be valid and ready to be analyzed.
unfortunately, I do not have the ability or the knowledge now to do that and I need it as soon as possible.
Can anyone help with the problem? thank you so much for your time.
kind regards.
attachments:
-data set csv file.
-error png clip.
Relevant answer
Answer
ata cleaning is the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset. When combining multiple data sources, there are many opportunities for data to be duplicated or mislabeled. If data is incorrect, outcomes and algorithms are unreliable, even though they may look correct. There is no one absolute way to prescribe the exact steps in the data cleaning process because the processes will vary from dataset to dataset. But it is crucial to establish a template for your data cleaning process so you know you are doing it the right way every time.
Regards,
Shafagat
  • asked a question related to Data Mining
Question
4 answers
We are looking at the application of data mining in water quality space. There are several articles to begin with and refer, and it is a bit confusing. Trying to narrow down the scope.
Relevant answer
Answer
The objectives in evaluating River profile in urban center s
  • asked a question related to Data Mining
Question
1 answer
Hi everyone, well the thing is im trying to apply spatial data mining to a set of vector and raster files so i need a way to convert my raster archives into a csv in order to run the mining
A little bit of background, my thesis is about applying data mining in archeology with the intention of modeling archaeological sites, currently im struggling to convert the rasters in csv to run the data mining
Relevant answer
Answer
The Export Raster pane allows you to export the entire raster dataset, mosaic dataset, image service or the portion in the display.
  1. In the Contents pane, right-click the raster layer you want to export, click Data, and click Export Raster. ...
  2. Choose the appropriate output as required in the Output Raster Dataset field.
  • asked a question related to Data Mining
Question
1 answer
Could you please recommend to me a package or tool for the drop3 instance selection method?
Relevant answer
Answer
Hopefully this link will help:
  • asked a question related to Data Mining
Question
9 answers
Well,
I am a very curious person. During Covid-19 in 2020, I through coded data and taking only the last name, noticed in my country that people with certain surnames were more likely to die than others (and this pattern has remained unchanged over time). Through mathematical ratio and proportion, inconsistencies were found by performing a "conversion" so that all surnames had the same weighting. The rest, simple exercise of probability and statistics revealed this controversial fact.
Of course, what I did was a shallow study, just a data mining exercise, but it has been something that caught my attention, even more so when talking to an Indian researcher who found similar patterns within his country about another disease.
In the context of pandemics (for the end of these and others that may come)
I think it would be interesting to have a line of research involving different professionals such as data scientists; statisticians/mathematicians; sociology and demographics; human sciences; biological sciences to compose a more refined study on this premise.
Some questions still remain:
What if we could have such answers? How should Research Ethics be handled? Could we warn people about care? How would people with certain last names considered at risk react? And the other way around? From a sociological point of view, could such a recommendation divide society into "superior" or "inferior" genes?
What do you think about it?
=================================
Note: Due to important personal matters I have taken a break and returned with my activities today, February 13, 2023. I am too happy to come across many interesting feedbacks.
Relevant answer
Answer
It is just coincidental
  • asked a question related to Data Mining
Question
3 answers
Dear all,
Why forward selection search is very popular and widely used in FS based on mutual information such as MRMR, JMI, CMIM, and JMIM (See )? Why other search approaches such as the beam search approach is not used? If there is a reason for that, kindly reply to me.
Relevant answer
Answer
There is three main types of feature selection, filtering methods, wrapper methods, and embedded methods. Filtering methods use criteria based metrics that are independent to the modeling process and uses criteria such as mutual information, correlation or Chi square test to check each feature or a selection of features compared with the target. Other type of filtering methods includes variance thresholding and ANOVA. Wrapper methods uses error rates to help train individual models or subsets of features iteratively to select the critical features. Subsets can be selected Sequential Forward Selection, sequential backwards selection, bidirectional selection or randomally. With selecting features and training they are therefore more computationally expensive than filtering methods. There are heuristic approaches too such as Branch and Bound Search that are non exhausted searches. In some cases filtering methods are used before wrapper methods. Embedded methods includes use of decision trees or random forests for extracting feature importance for deciding which features to select. Overall feedforward, backward and bidrectional methods are stepwise methods for searching for crucial features. In regards to beam search which is more of a graph based heuristic optimization method that is similar to Best first search , that can be seen applied in neural network optimization or tree optimization rather than direct as a feature selection method.
  • asked a question related to Data Mining
Question
4 answers
Data mining has a broad discussion of how to manipulate data mining on other algorithms.
Relevant answer
Answer
Data mining is the process used to analyze data for relationships that have not previously been discovered, typically within existing large databases that work on mega data.
Moreover, there are four main vital properties of data mining which are;
I. Automatic Discovery of patterns
II. Prediction of likely outcomes
III. Creation of actionable information
IV. Focus on large data sets and databases
  • asked a question related to Data Mining
Question
3 answers
How can I distinctively differentiate between 'data mining', 'data analysis', and 'data analytics'?
Is there any example to add, towards proper understanding of the differences?
Thank you!
Relevant answer
Answer
Differences between data analytics and data mining (ironhack.com)
  • asked a question related to Data Mining
Question
8 answers
One of my master students is currently conducting a preliminary study to find out the maturity of the Cross Industry Standard Process for Big Data (CRISP4BigData) for use in Big Data projects. I would like to invite all scientists, Big Data experts, project managers, data engineers, data scientists from my network to participate in the following survey. Feel free to share!
Relevant answer
Answer
Done
  • asked a question related to Data Mining
Question
6 answers
I'm an undergraduate doing a Software Engineering degree. I'm looking for a research topic for my final year project. If anyone has any ideas or research topics or any advice on how or where to find one please post them.
Thanks in advance ✌
Relevant answer
Answer
Most of the SE based on Design and cost functions. Concentrate on
  • asked a question related to Data Mining
Question
1 answer
Is there an updated list of ?
Relevant answer
Answer
Im not sure what you mean by ’approved’ databases. Approved by what/who?
  • asked a question related to Data Mining
Question
2 answers
Hi, Could you please guide me how to conduct Latent Semantic Analysis through text mining for my business research, any website, book or tutorial videos? so I can apply this method for my research project. Thanks in advance. Kind regards Bushra Aziz
Relevant answer
Answer
Text Analytics Toolbox of MATLAB maybe suitable for your task. In practice, it is more friendly to beginners compared with Python tools. On the official website and its help centre, tutorial materials are provided in the manner of step by step. As well, some videos you can find on Youtube about it.
  • asked a question related to Data Mining
Question
4 answers
Hi,
Thank you for help.
How to make the scheduling process in CloudSim an environment for my reinforcement learning model ?
Relevant answer
Answer
Thank you for sharing the links and papers, I will use them to learn.
I appreciate your time and efforts
Best Regards,
Bashar
  • asked a question related to Data Mining
Question
9 answers
I am looking for a justification to associate data mining with big data analytics, however, many researches have observed that in addition to the characteristics of the data, there is a line of thought that guides a question of taxonomy, that is, data mining is a step in the big data analytics, can I think of it this way? Or is there something I'm not considering?
Relevant answer
Answer
please refer literature relevant to study
  • asked a question related to Data Mining
Question
7 answers
Modern politics is characterized by many aspects which were not associated with traditional politics. Big data is one of them. Data mining is being done by political parties as they seek help from data scientists to arrive at various patterns to identify behavior of voters. Question is, what are the various ways in which big data is being used by modern political parties and leaders?
Relevant answer
Answer
Big Data platforms allow government agencies to access large volumes of information that are essential for their daily operations. With real-time access, governments can identify areas that require attention, make better and more timely judgments about how to proceed, and enact the necessary changes.
  • asked a question related to Data Mining
Question
9 answers
Data Mining and Machine Learning looks similar to me. Can you elaborate the difference between these two? As per my understanding-
Data Mining is about finding some useful information and using that information in decision making. That means using the known properties of data we are finding the unknown property of data. e.g. Studying sales of computers in different regions and supply them accordingly.
On the other hand ML is about prediction of results. It uses known properties of data to find other known property of data with new data instances. e.g. Prediction of house prize after 5 years from the existing data of house sales.
  • asked a question related to Data Mining
Question
3 answers
I require some suggestions and need a health insurance dataset where text mining can be possible.Any recent papers addressing dataset can be helpful
Relevant answer
Answer
Dear Anuradha,
Please check the following link:
  • asked a question related to Data Mining
Question
7 answers
I have a data set that contains a text field for approximately more than 3000 records, all of which contain notes from the doctor. I need to extract specific information from all of them, for example, the doctor's final decision and the classification of the patient, so what is the most appropriate way to analyze these texts? should I use information retrieval or information extraction, or the Q and A system will be fine
Relevant answer
Answer
DEAR Matiam Essa
This text mining technique focuses on identifying the extraction of entities, attributes, and their relationships from semi-structured or unstructured texts. Whatever information is extracted is then stored in a database for future access and retrieval.The famous technique are:
Information Extraction (IE)
Information Retrieval (IR)
Natural Language Processing
Clustering
Categorization
Visualization
With the increasing amount of text data, effective techniques need to be employed to examine the data and to extract relevant information from it. We have understood that various text mining techniques are used to decipher the interesting information efficiently from multiple sources of textual data and continually used to improve text mining process.
GOOD LUCK
  • asked a question related to Data Mining
Question
7 answers
Which tools solves prediction problems effectively other than python based ?
Relevant answer
Answer
Totally we can't say which tool is the best one since it depends on data type and every person. Here you can find some of them:
I prefer Orange Data Mining. It is a FREE and opensource data visualization, machine learning, and data mining toolkit.
  • asked a question related to Data Mining
Question
3 answers
Problem statement: Google Trend Analysis and Paradigm Shift of Online Education
Platforms during the COVID-19 Pandemic
I would like to know what methodoligies, Data preprocessing techniques methods ,data mining methods,metrics used for this Analysis.
Relevant answer
Good morning
I invited you to see SCOPUS and Web of Sciences database
Best regards
Ph.D., MBA Ingrid del Valle García Carreno
  • asked a question related to Data Mining
Question
4 answers
How do data mining researchers test or evaluate their data mining model's EFFICIENCY?
or an ISO cert evaluation?
The model created is an output of the hypothesis and theory in my mind that I want to test so I unlikely want to use other people to evaluated the model like a system.
Since data mining evalation metrics alone can not be use to support the study.
I am searching for a study/research of way I can back up my study for the efficancy of the model created.
Feel free to educate me. I would love to hear your thoughts.
Relevant answer
Answer
  • asked a question related to Data Mining
Question
13 answers
Hi everybody,
I would like to do part of speech tagging in an unsupervised manner, what are the potential solutions?
Relevant answer
Answer
  • asked a question related to Data Mining
Question
3 answers
Please suggest R packages and codes for text ming (or any other programming) to search pubmed database.
Relevant answer
Answer
Ajit Kumar Singh Enter a free text search into the PubReMiner tool, and it will search PubMed for results. The program analyzes these data and generates tables that rank the frequency of terms in the articles' titles and abstracts, as well as related MeSH categories.
  • asked a question related to Data Mining
Question
11 answers
Data Mining (DM) is a process of extracting and discovering patterns in large data sets including methods of Machine Learning (including Deep Learning and Statistical Learning), Statistics, and Database Systems.
Machine Learning (ML) is the study of computer algorithms that improve automatically through experience and by the use of data.
It would seem very simplistic to consider the ML only as a part of the larger field of the DM.
From a very rough and general point of view, DM and ML are part of the mathematics.
From another point of view, more precise but more obsolete, they are both seen as a part of Artificial Intelligence.
I would like to propose to consider both disciplines as overlapping for most of their methods.
Do you have at least 3 differences between DM and ML to report?
Relevant answer
Answer
I think machine learning facilitate data mining. As such, we may say that ML algorithms are just tools for data mining.
  • asked a question related to Data Mining
Question
2 answers
Dear Madam, Please advise about post Doc supervisors in the university in the field of educational data mining and learning analytics for strengthening university decision making. I will be grateful
  • asked a question related to Data Mining
Question
11 answers
how to measure classification errors using weka. can we take the value of RSME or etc to utilize for taken the classification rate?
Relevant answer
Answer
I have been teaching myself how to use RWeka, specifically so that I may implement the M5P model. I have been able to use apply to my data, but do not understand what the percentage represents. For example, the beginning of the sample output from RWeka's manual is:
M5 pruned model tree: (using smoothed linear models) CHMIN <= 7.5 : LM1 (165/12.903%)
The other LMs have other "scores" like this, like (6/18.551%) and (23/48.302%). What exactly do these percentages and numbers represent?
  • asked a question related to Data Mining
Question
13 answers
I'm searching about autoencoders and their application in machine learning issues. But I have a fundamental question.
As we all know, there are various types of autoencoders, such as ​Stack Autoencoder, Sparse Autoencoder, Denoising Autoencoder, Adversarial Autoencoder, Convolutional Autoencoder, Semi- Autoencoder, Dual Autoencoder, Contractive Autoencoder, and others that are better versions of what we had before. Autoencoder is also known to be used in Graph Networks (GN), Recommender Systems(RS), Natural Language Processing (NLP), and Machine Vision (CV). This is my main concern:
Because the input and structure of each of these machine learning problems are different, which version of Autoencoder is appropriate for which machine learning problem.
Relevant answer
Answer
Look the link, maybe useful.
Regards,
Shafagat
  • asked a question related to Data Mining
Question
1 answer
I am very new to these forecasting methods. Can someone help me with how to forecast the next period using these methods?
I have weekly demand data where I classified them into lumpy, erratic, and smooth demands. As Croston's forecasting method is the best suited for smooth and SBA method for lumpy, I require their forecasting process to plan the demand for the next weekly period.
Is there any other method to forecast lumpy and smooth demands other than this method?
Thank you in advance
  • asked a question related to Data Mining
Question
4 answers
I am looking a free of charge International Conference in metaheuristic algorithm or data mining issue, is there ay one can help me?
Relevant answer
Sikirat Aina thanks.
  • asked a question related to Data Mining
Question
5 answers
I have past 4 years of weekly demand data. There are various products with their demand values. I am trying to calculate the future weekly values for a year. The data doesn't follow any trend and it is random. There are many weeks with Zero demands too.
I am very new to time series analysis. Can someone help me in suggesting an appropriate method?
Relevant answer
Answer
1-st of all, the fundamental assumption of any forecasting technique (implicit or explicit) is that time series represents a stable pattern that can be identified and then extended into the future. If a pattern of the past data-points is not statistically stable or random (as is in your case), then no meaningful future prediction (forecasting) is possible regardless of the sophistication of the forecasting technique.
Because your time series data points are random with no trend, your best bet is generating other random points from your current data distribution and treat these new random points as your forecast. You could build a histogram of your existing data points and generate new random points from this histogram.
  • asked a question related to Data Mining
Question
7 answers
I have some Key Informant Interview (KII) data. I want to apply Natural Language Processing (NLP) to identify the pattern in the data. Can applying NLP for analyzing KII be mentioned as data analytics tools in the report/paper?TIA
Relevant answer
Answer
Of course, it is an interesting work. For example, (1) using NER (Named Entity Recognition), RE (Relation Extraction) to construct Knowledge Graph, then analyzing the relations between the interviewees or the knowledge constitution of an interviewee ; (2) using EE (Event Extraction) to identify the event correlation between the questions and answers; (3)using SA (Sentiment Analysis) to analyze the attitudes toward to the interviewer or the company, etc; (4) using topic models to analyze the topics about the interview and finding out which topic the interviewers are most interested in; etc.
Many,many interesting jobs you can do by using NLP analysis. Wish you finished an interesting paper in some days.
  • asked a question related to Data Mining
Question
4 answers
I have compiled a list of lecture note, examples, and notes from Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems). The attached pdf is the first iteration of the text at this point it is just a manuscript. I would appreciate feedback on how to organize and structure the text in a way that it could be presented to a publisher.
Relevant answer
Answer
Przemysław Dolata Thank you for your advice. There are many PhD programs available. I am currently applying to several. I would love to get advice on your strategy for reading and analyzing texts.
  • asked a question related to Data Mining
Question
9 answers
What will be the future applications of analytics of large data sets conducted in the computing cloud on computerized Business Intelligence analytical platforms in Big Data database systems in enterprise logistics management?
The analytics conducted on computerized Business Intelligence platforms is one of the key advanced information technology technologies of the fourth technological revolution, known as Industry 4.0. The current technological revolution described as Industry 4.0 is determined by the development of the following technologies of advanced information processing: Big Data database technologies, cloud computing, machine learning, Internet of Things, artificial intelligence, Business Intelligence and other advanced data mining technologies.
The analytics conducted on computerized Business Intelligence platforms currently supports business management processes, including logistics management.
In my opinion, the use of analytics of large data sets conducted in the computing cloud on computerized Business Intelligence analytical platforms in Big Data database systems in enterprise logistics management, including supply logistics, production logistics, provision of services and distribution of manufactured products and services, is currently growing.
The analytics conducted on large data sets conducted in the cloud computing on Business Intelligence computerized platforms in Big Data database systems makes it particularly easy to identify opportunities and threats to business development, allows for quick generation of analytical reports on selected issues in the economic and financial situation of the business entity. In this way, the generated reports can be helpful in the processes of enterprise logistics management, including supply logistics, production logistics, provision of services and distribution of manufactured products and services.
Do you agree with my opinion on this matter?
In view of the above, I am asking you the following question:
What will be the future applications of analytics of large data sets conducted in the computing cloud on computerized Business Intelligence analytical platforms in Big Data database systems in enterprise logistics management?
Please reply
I invite you to the discussion
The issues of the use of information contained in Big Data database systems for the purposes of conducting Business Intelligence analyzes are described in the publications:
I invite you to discussion and cooperation.
Best wishes
Relevant answer
Answer
It is rising field since intelligence and in general artificial intelligence becomes the dominant technology of current era
  • asked a question related to Data Mining
Question
12 answers
i am doing project on automated classification of software requirement sing NLP and machine learning approach i.e. Naive Bayes. For this i require dataset of classified software requirements. i have searched PROMISE data repository, but didnot find dataset according to my need. can someone help me it will be highly appreciated if someone tell me from where i can find and download this dataset.
Relevant answer
Answer
The PROMISE dataset is here: https://doi.org/10.5281/zenodo.268542
The PURE dataset is here: https://doi.org/10.5281/zenodo.1414117
  • asked a question related to Data Mining
Question
6 answers
dear community, I need your help regarding extracting data from the Binance platform in order to use it for a forecasting problem , for example we extract data about a certain crypto then we clean it and make it ready for use and make a forecast if we should buy it or not with adding an alarm when the time is perfect for that, using python and machine learning and statistics.
  • asked a question related to Data Mining
Question
23 answers
Hi everyone
I'm looking for a quick and reliable way to estimate my missing climatological data. My data is daily and more than 40 years. These data include the minimum and maximum temperature, precipitation, sunshine hours, relative humidity and wind speed. My main problem is the sunshine hours data that has a lot of defects. These defects are diffuse in time series. Sometimes it encompasses several months and even a few years. The number of stations I work on is 18. Given the fact that my data is daily, the number of missing data is high. So I need to estimate missing data before starting work. Your comments and experiences can be very helpful.
Thank you so much for advising me.
Relevant answer
Answer
It is in French
  • asked a question related to Data Mining
Question
14 answers
Cluster analysis, classification, Data Mining
Relevant answer
Answer
Grouping related data according to categories or themes. These are based on inter-relations between the variables which can influence each other in the respective setting.
  • asked a question related to Data Mining
Question
3 answers
Hi Fellows,
The matrix is here at the bottom: https://statweb.stanford.edu/~jtaylo/courses/stats202/visualization.html. A similar version is seen on the book Introduction to Data Mining. It's clear that colours toward the red end indicate stronger correlation, but what attributes or variables are really correlated as shown? For example, along the main diagonal, cases of the same species show mostly perfect correlation, with a few near-perfect occurrences. Normally, a correlation is calculated with two columns of values, not two single cases.
Thanks
RP
  • asked a question related to Data Mining
Question
4 answers
I am working on a data mining project and would like to portray the correlation between healthcare expenditure by country and the population's life expectancy/general health and am having trouble finding sizeable data sets.
Relevant answer
Answer
Healthcare expenditures: http://wdi.worldbank.org/table/2.12
Here's the full list of indicators: http://wdi.worldbank.org/table
  • asked a question related to Data Mining
Question
4 answers
Hi
How can new data mining methods be used to assess the ecological potential of the land?
Relevant answer
Answer
using algorithms of machain learning
  • asked a question related to Data Mining
Question
10 answers
I am passionate for working on medical data. but unfortunately the disease on which I want to work, I couldn't find data in my home country. Anyone Up from medical informatics and health data mining who can collaborate with me?
Relevant answer
Answer
Please have look on our(Eminent Biosciences (EMBS)) collaborations.. and let me know if interested to associate with us
Our recent publications In collaborations with industries and academia in India and world wide.
EMBS publication In association with Universidad Tecnológica Metropolitana, Santiago, Chile. Publication Link: https://pubmed.ncbi.nlm.nih.gov/33397265/
EMBS publication In association with Moscow State University , Russia. Publication Link: https://pubmed.ncbi.nlm.nih.gov/32967475/
EMBS publication In association with Icahn Institute of Genomics and Multiscale Biology,, Mount Sinai Health System, Manhattan, NY, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
EMBS publication In association with University of Missouri, St. Louis, MO, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30457050
EMBS publication In association with Virginia Commonwealth University, Richmond, Virginia, USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
EMBS publication In association with ICMR- NIN(National Institute of Nutrition), Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
EMBS publication In association with University of Minnesota Duluth, Duluth MN 55811 USA. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852211
EMBS publication In association with University of Yaounde I, PO Box 812, Yaoundé, Cameroon. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
EMBS publication In association with Federal University of Paraíba, João Pessoa, PB, Brazil. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30693065
Eminent Biosciences(EMBS) and University of Yaoundé I, Yaoundé, Cameroon. Publication Link: https://pubmed.ncbi.nlm.nih.gov/31210847/
Eminent Biosciences(EMBS) and University of the Basque Country UPV/EHU, 48080, Leioa, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27852204
Eminent Biosciences(EMBS) and King Saud University, Riyadh, Saudi Arabia. Publication Link: http://www.eurekaselect.com/135585
Eminent Biosciences(EMBS) and NIPER , Hyderabad, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Eminent Biosciences(EMBS) and Alagappa University, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30950335
Eminent Biosciences(EMBS) and Jawaharlal Nehru Technological University, Hyderabad , India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Eminent Biosciences(EMBS) and C.S.I.R – CRISAT, Karaikudi, Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237676
Eminent Biosciences(EMBS) and Karpagam academy of higher education, Eachinary, Coimbatore , Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Eminent Biosciences(EMBS) and Ballets Olaeta Kalea, 4, 48014 Bilbao, Bizkaia, Spain. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29199918
Eminent Biosciences(EMBS) and Hospital for Genetic Diseases, Osmania University, Hyderabad - 500 016, Telangana, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/28472910
Eminent Biosciences(EMBS) and School of Ocean Science and Technology, Kerala University of Fisheries and Ocean Studies, Panangad-682 506, Cochin, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27964704
Eminent Biosciences(EMBS) and CODEWEL Nireekshana-ACET, Hyderabad, Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26770024
Eminent Biosciences(EMBS) and Bharathiyar University, Coimbatore-641046, Tamilnadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27919211
Eminent Biosciences(EMBS) and LPU University, Phagwara, Punjab, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/31030499
Eminent Biosciences(EMBS) and Department of Bioinformatics, Kerala University, Kerala. Publication Link: http://www.eurekaselect.com/135585
Eminent Biosciences(EMBS) and Gandhi Medical College and Osmania Medical College, Hyderabad 500 038, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27450915
Eminent Biosciences(EMBS) and National College (Affiliated to Bharathidasan University), Tiruchirapalli, 620 001 Tamil Nadu, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/27266485
Eminent Biosciences(EMBS) and University of Calicut - 673635, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/23030611
Eminent Biosciences(EMBS) and NIPER, Hyderabad, India. ) Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/29053759
Eminent Biosciences(EMBS) and King George's Medical University, (Erstwhile C.S.M. Medical University), Lucknow-226 003, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579575
Eminent Biosciences(EMBS) and School of Chemical & Biotechnology, SASTRA University, Thanjavur, India Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25579569
Eminent Biosciences(EMBS) and Safi center for scientific research, Malappuram, Kerala, India. Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/30237672
Eminent Biosciences(EMBS) and Dept of Genetics, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/25248957
EMBS publication In association with Institute of Genetics and Hospital for Genetic Diseases, Osmania University, Hyderabad Publication Link: https://www.ncbi.nlm.nih.gov/pubmed/26229292
Sincerely,
Dr. Anuraj Nayarisseri
Principal Scientist & Director,
Eminent Biosciences.
Mob :+91 97522 95342
  • asked a question related to Data Mining
Question
11 answers
Hi There!
My data has a number of features (with contain continuous data) and a response feature (class label) of categorical data (binary). My intention is to study the variation of the response feature (Class ) due to all the other features using a variety of feature selection techniques. Kindly help in pointing out right techniques for the purpose. Data is like this:
------------------------------------------------------------------
f1 f2 f3 f4 ... fn class
------------------------------------------------------------------
0.2 0.3 0.87 0.6 ... 0.7 0
0.2 0.3 0.87 0.6 ... 0.7 1
0.2 0.3 0.87 0.6 ... 0.7 0
0.2 0.3 0.87 0.6 ... 0.7 1
-------------------------------------------------------------------
Relevant answer
Answer
You can select the best algorithm based on the measure of performance from a number of data mining algorithms. A exhaustive list may be found: https://www.kdnuggets.com/2015/05/top-10-data-mining-algorithms-explained.html
  • asked a question related to Data Mining
Question
4 answers
I think that Generative Adversarial Networks can be used as Data Farming Means. What do you know about such an approach? Can you give another example of means for Data Farming?
Relevant answer
Answer
Other approaches exist. For instance (and mostly application based)
Extreme Data Mining.
A strategy to apply machine learning to small datasets in materials science.
Machine learning on small size samples: A synthetic knowledge synthesis.
  • asked a question related to Data Mining
Question
4 answers
Why Particle Swarm Optimization works better for this classification problem?
Can anyone give me any strong reasons behind it?
Thanks in advance.
Relevant answer
Answer
Arash Mazidi PSO is also in various classification problems. I particularly use it for Phishing website datasets.
  • asked a question related to Data Mining
Question
4 answers
How many respondents are really enough?
There are two schools of thought about sample size a relatively small sample size is adequate. Perhaps 300-500 respondents can work?
Relevant answer
Answer
What is the best number of respondents when conducting a research?
There are two schools of thought about sample size – one is that as long as a survey is representative, a relatively small sample size is adequate. Perhaps 300-500 respondentscan work. The other point of view is that while maintaining a representative sample is essential, the more respondents you have the better.
Regards,
Shafagat
  • asked a question related to Data Mining
Question
7 answers
Please share the paper and throw the light on text mining and meta analysis
Relevant answer
Answer
Here's a link of my meta-analysis paper
  • asked a question related to Data Mining
Question
9 answers
Let consider there is a selling factor like this:
Gender | Age | Street | Item 1 | Count 1 | Item 2 | Count 2 | ... | Item N | Count N | Total Price (Label)
Male | 22 | S1 | Milk | 2 | Bread | 5 | ... | - | - | 10 $
Female | 10 | S2 | Cofee | 1 | - | - | ... | - | - | 1 $
....
We want to predict the total price for a factor based on their buyer demographic information (like gender, age, job) and also their buying items and counts. It should be mentioned that we suppose that we don't know each item's price and also, the prices will be changed during the time (so, we although will have a date in our dataset).
Now it is the main question that how we can use this dataset that contains some transactional data (items) which their combination is not important. For example, if somebody buys item1 and item2, it is equal to other guys who buy item2 and item1. So, the values of our items columns should not have any differences for their value orders.
This dataset contains both multivariate and transactional data. My question is how can we predict the label more accurately?
Relevant answer
Answer
Hi Dr Behzad Soleimani Neysiani . I agree with Dr Qamar Ul Islam .
  • asked a question related to Data Mining
Question
4 answers
For example, k-nearest neighbor needs to compute the smallest one of distances between a query and a large number of data.
But, k-means clustering computes the smallest one of distances between each data and a few cluster center.
Like k-nearest neighbor, which technique requires to compute the maximum or minimum value in a large number of data?
Relevant answer
Answer
I recommend reading the following paper as it contains useful information to answer yoru question: