Science topic

Distributed Data Mining - Science topic

Explore the latest questions and answers in Distributed Data Mining, and find Distributed Data Mining experts.
Questions related to Distributed Data Mining
  • asked a question related to Distributed Data Mining
Question
4 answers
I think that Generative Adversarial Networks can be used as Data Farming Means. What do you know about such an approach? Can you give another example of means for Data Farming?
Relevant answer
Answer
I think it's a recent technology, and recently , I had read an article about this topic, In this approach, two neural networks compete with each other to become more accurate in their predictions, I think via algorithm prediction.
  • asked a question related to Distributed Data Mining
Question
6 answers
Hello everyone,
My issue is about a water distribution system that I am working on a zone of the system where does not exist any plan for the place of pipes. However, we have the place of actuators like different types of valves, pressure relief valves, pressure meters, flow meters and tanks. Also, we have the place of demands where suffer from pressure loss. Now the question is how we can do pressure management using actuators to maximize water pressure for all demands of the zone based on the previous recorded data while we are going to minimize water loss as well as pipes damaging. Please let me know if you have any idea or you know any suitable paper for this issue.
Thanks.
Relevant answer
Answer
It is an interesting project. The first step should be collecting data. Open more or less systematically taps and valves and record angles, numbers of rotations, pressures, flow. With that data you can train a neural network. In the end the neural network is a model of your pipe system. You can then do simulations by varying inputs and try to achieve the desired outputs.
In hydrology, fluorescein sodium is used to find out the actual waterways. I would be surprised if there weren't any microbiology algorithms to infer pathways. Network optimization is often done with Ford Fulkerson.
Regards,
Joachim
  • asked a question related to Distributed Data Mining
Question
23 answers
I need to do some comparison with other methods for a new rule of combination under Dempster-Shafer theory. I would like to use the same data used in ‘Combining Multiple Hypotheses for Identifying Human Activities’ by Young-Woo Seo and Katia Sycara. Unfortunately, those data are no longer available at http://www.cs.utexas.edu/users/sherstov/pdmc/ .  This data set was originally released to a Physiological Data Modeling Contest (PDMC) at the site cited above.  Is there someone that can provide me the data or could reference a site where I can get it?
Relevant answer
Answer
Dear Ibrahim Musa I'm interested in using the PDMC dataset for my thesis. Could I get a copy too? Thank you!!
  • asked a question related to Distributed Data Mining
Question
2 answers
Are you aware of any simulator allowing distributed data mining? I would like find an IoT-based dataset and a simulator allowing me to perform distributed data mining
Relevant answer
Answer
RapidMiner
  • asked a question related to Distributed Data Mining
Question
13 answers
I have a sort of data in which the change in the weight of materials is recorded during the time. Unfortunately because of special condition I cannot record the weight in the first 75 seconds.
- Is there any way to predict the initial missed data (I mean the change in the weight in the first 75 seconds)?
- How can I find the equation of the curve that fit the data points?
Any solution with MATLAB, SPSS, and Excel softwares is appreciated.
Relevant answer
Answer
hello,
please use any statistical forecasting techniques and check which one is suitable for your data.
good luck@
  • asked a question related to Distributed Data Mining
Question
7 answers
Could you please share some current research trends/topics/techniques in Data Mining and Knowledge Discovery?
Relevant answer
Answer
Dear Imran,
You have to specify the domain so that the trend can be searched. However, you may follow some of the important papers below.
1. Bakhshinategh, B., Zaiane, O. R., ElAtia, S., & Ipperciel, D. (2018). Educational data mining applications and tasks: A survey of the last 10 years. Education and Information Technologies, 23(1), 537-553.
2. Bandaru, S., Ng, A. H., & Deb, K. (2017). Data mining methods for knowledge discovery in multi-objective optimization: Part A-Survey. Expert Systems with Applications, 70, 139-159.
Thanks,
Sobhan
  • asked a question related to Distributed Data Mining
Question
3 answers
Actually I want to implement gauss Seidel method to find out the solution of linear equation system of sparse matrices but now i stuck with the dependency in every iteration and not getting any solution.. please provide some resource so that I could implement it...
Relevant answer
Answer
I hope this work could help you:
  • asked a question related to Distributed Data Mining
  • asked a question related to Distributed Data Mining
Question
5 answers
Hi all,
I'm currently working with hadoop using Hadoop 2.3.2 Hortonworks sandbox that runs on VMware.  I wish to load a dataset by following the "hello world" tutorial as provided by Hortonworks webpage. I followed exactly the steps in that tutorial. As said in that tutorial, to load a dataset, I need to create a temporary data directory by clicking on the new directory button. However, the new directory button is disabled in the admin ambary dashboard. Does anyone here has any suggestion or recommendation on any other better hadoop installer which is more easier?
Relevant answer
Answer
What OS you are using as guest? If it is UBUNTU 14.04 / 14.10 type the following command:
# hadoop fs -put <from directory >  <to hadoop directory>
for this first you have to start the Hadoop cluster and then you have to create a directory with the following command:
#  hadoop fs -mkdir <directory>
  • asked a question related to Distributed Data Mining
Question
1 answer
I was trying to implement the USD algorithm (Paper Tile: Discretization oriented to Decision Rules). However, I have some doubts:
1. At line 20 of the algorithm it is written that Ii has the same majority class than Ii+1. What is the meaning of this?
2. At line 20 of the algorithm it is written that there is a tie in Ii or Ii+1. A tie of what?
3. What is the requirement of line number 14 & 15 when line number 11 & 12 is covering all consecutive intervals?
Relevant answer
loop testing algorithm are suitable
  • asked a question related to Distributed Data Mining
Question
3 answers
Dear ResearchGater,
I'm trying to identify association between keywords in PubMed. The prototypic search could be : listing kinases (classified in term of number of publication) that are associated to the keyword cancer or inflammation. Does anybody have an idea of an easily accessible tool that can perform such search. Thank you for your help. 
Relevant answer
Answer
Thank you Laurent for the informations. I will give an eye on it. Hope it will be user friendly because my knowledge in programing is limited. Regards
  • asked a question related to Distributed Data Mining
Question
2 answers
Hello,
I want to find out how many modes are present in data distribution. As per my search I found many methods for testing whether a distribution is unimodal or multimodal but I am interested in finding out number of modes available in distribution. Can any one suggest me how to estimate this?
Relevant answer
Answer
By counting the pics in the histogram of your data.
  • asked a question related to Distributed Data Mining
Question
3 answers
Can anyone tell me about real life live scenarios where Distributed Data Mining has been actually applied to wireless sensor networks in order to aid for decision making ?
Relevant answer
Answer
Thank you Christian Fischbach.
  • asked a question related to Distributed Data Mining
Question
1 answer
I want to implement ECC algorithm in COOJA simulator and want to compare the performance of RSA and ECC in IoT(Internet of Things) nodes. I am using Contiki and Cooja.
Relevant answer
Answer
  • asked a question related to Distributed Data Mining
Question
5 answers
I want to find out how possible it is to seamlessly integrate these technologies.
Thank you 
Relevant answer
Answer
You can use Apache flume to extract twitter data cos it has Hadoop and MapReduce integrated in it....once the data is retrieved, you can then export it to R for analysis
  • asked a question related to Distributed Data Mining
Question
1 answer
The data by T2 Hotelling  are assumed to be multivariate normal distribution.SVDD is  not strict  for data distribution. Therefore, for  non-gaussian data monitoring ,we should use SVDD monitoring. However, the monitoring effect of T2 is better than SVDD.What data  is SVDD more suitable for  monitoring?
Relevant answer
Answer
Dear Zhao
SVDD is one of the OCCs that is better to used in semi supervised area.
In especial case, Density Weighted SVDD (DW-SVDD) is one of the improved version of SVDD that riches SVDD by attention to data distribution. For more information refer to link bellow.
Best regards
  • asked a question related to Distributed Data Mining
Question
4 answers
In distributed data mining how we get knowledge using association rule when data is increasing frequently in each sites..
Distributed Association Rule Mining
Relevant answer
Answer
There is a 1995 VLDB paper  (www.vldb.org/conf/1995/P432.PDF). This is the earliest paper for AR mining on large datasets.  See if it helps.
  • asked a question related to Distributed Data Mining
Question
3 answers
There are numerous hypothetical examples of "Privacy Preservation in Distributed Data Mining" in literature. However, in practice can anyone give me scenarios where it has been actually applied?
Relevant answer
Answer
The example is patient records. The pharmaceutical researchers need to examine the actual patient records to discover some previously unknown side effects of the tested drug. If a published record does not correspond to an existing patient in real life, it is difficult to deploy data mining results in the real world. Randomized and synthetic data do not meet this requirement. Although an encrypted record corresponds to a real life patient, the encryption hides the semantics required for acting on the patient represented.
  • asked a question related to Distributed Data Mining
  • asked a question related to Distributed Data Mining
Question
5 answers
In the following survey paper I found a comparison of some horizontal scaling platform and some vertical scaling platform. Now I want to make performance analysis of these platforms but I don't know whether there is any test bed available for such analysis.
Relevant answer
Answer
May be this thesis gives some helpful information:
Regards,
Reiner Creutzburg
  • asked a question related to Distributed Data Mining
Question
12 answers
I need the answer for the R-datamining tool . How much size it supports?
Relevant answer
Answer
R does have limitations. Currently the compilation uses libraries that are constrained to 32-bit integers. This means that some indeces and vectors are limited to the 32-bit (4G) limit. It is possible to find that some object (dataframe) "runs out of space" even when running R on a powerful large-memory computer.
There are ways around this, as well as packages that create only meta-objects in memory and use HDF5 or NetCDF file storage for very large objects (GenABEL, SNPrelate are examples). In addition there generic packages bigmemory and ff that can in some instances provide workarounds for the 32-bit integer limitation.
This is not to say that R isn't a wonderful system, just to be clear that there are limitations.
  • asked a question related to Distributed Data Mining
Question
5 answers
During any association mining process it is a big challenge to remove uninteresting rules. We are interested in effective formal and experimental method for finding interestingness of the multilevel rules.
Relevant answer
Answer
You could read this paper:
May be it helps you
  • asked a question related to Distributed Data Mining
Question
4 answers
If we have changed the source data, then do we have to follow the same step for finding/generating the rules? Or change the method?
Relevant answer
Answer
It depends on the type of data. For sequential data, you might prefer sequential pattern mining algorithms. You might also transform your data, ignore the timestamps, and apply an itemset mining algorithm.
If your data is not static, there are several cases, including:
-incremental (not too much) updates: then you should appy an incremental algorithm (itemsets, or sequential patterns, depending on your choice).
- streaming data (updates at high rate): then aplpy a streaming solution. It might be based on different model for the "observation window" such as sampling, batches, jumping windows or sliding window.
  • asked a question related to Distributed Data Mining
Question
3 answers
I want to cluster the short tweets and predict the sentiments of the users in real time.
Relevant answer
Answer
Perhaps you should take a look at MOA (Massive Online Analysis) at http://moa.cms.waikato.ac.nz. There is some twitter extensions and I believe a sentiment analysis framework too!
  • asked a question related to Distributed Data Mining
Question
2 answers
Given an auto-scaling system, we face inputs that have unpredictable patterns and volumes. Because they are allocated per input resource, fluctuations of input volume have much overhead of resource. Do you identify an algorithm that can help to systems performance?
Relevant answer
Answer
Dear Kareem
Salam,
According to above question, i have a auto-scaling system with swinging input volume and i want to allocate my resource in the best way.
On the other hand the pattern of my input volume is unpredictable however, is there any algorithm that can help?
  • asked a question related to Distributed Data Mining
Question
1 answer
In traditional parallel and distributed data mining algorithms the issues are data decomposition: data and task, data layout: horizontal and vertical, load balancing: static and dynamic, memory used: shared, distributed and hybrid. So if we design data mining algorithms on the MapReduce platform what should be the research issues?
Relevant answer
Answer
There has been significant work building algorithms on top of MapReduce. Some very good papers surveying this topic include:
  • asked a question related to Distributed Data Mining
Question
9 answers
I am working on Distributed Association Rule Mining. I need data sets to simulate my program on it.
Relevant answer
  • asked a question related to Distributed Data Mining
Question
6 answers
I want to implement distributed association rule mining algorithms on either or both but don't know much about programming in grid or cloud environment.
  • asked a question related to Distributed Data Mining
Question
87 answers
Can anybody please help me to find some good survey/review paper on parallel and distributed association rule mining, grid based, and cloud based association rule mining?
Relevant answer
Answer
Before you ask what sections should be contained in a survey paper, you should first understand what is a survey paper. What it is *not* is simply a core dump of a bunch of papers in a common area.
Think of a survey as a research paper whose data and results are taken from other papers. This means that you should have a point to make or some new conclusion to draw. And you'll do so by collecting data from a broad collection of previous works.
As a previous responder has said, you should have a thorough and deep knowledge of the field that you are surveying. This knowledge should be sufficient to be completely aware of the the main themes, directions, controversies, and results in that field.
The point you will make will determine the organization of survey paper. The structure of the main sections of the paper will reflect the structure of field. Some possible example structures (which of course depend completely on the topic) might be:
1. Increasingly complexity or scale: There may be a spectrum of solutions and you might organize them by complexity or scale.
2. Static vs. dynamic: Many field organize by static techniques, dynamic techniques, and even hybrid.
3. Partitioning the design space: Lots of systems are made up of components, so maybe for an compiler paper, you could divide by the classic scanner, parser, symbol table, code generator and optimizer.
4. Major techniques in a field: For example, in fault tolerance, you see fail stop vs. fail forward, or logging vs. hot-backup. In concurrency control, there is a natural divide between optimistic vs. pessimistic techniques.
5. Historical: sometimes the course of development of a field has a clear linear nature and is intrinsically interesting in itself. This is an over-used techniques in many cases where it really doesn't add understanding.
There are lots of possibilities for a given topic and it is this organization that is the hardest part of writing a survey paper. (I'm sure that many of you can give good examples of organizations that have worked well for you.)
You'll have written a successful survey paper if you can communicate not just the list of results, but more important, your understanding of the structure of the field.
This is a high bar to set. And it is also why I never ask students in my graduate classes to write such papers; they just don't have the experience and perspective to write a good survey.
  • asked a question related to Distributed Data Mining
Question
5 answers
Where to find these tools ?
Relevant answer
Answer
For a practical introduction to Hadoop, you can make reference to "Hadoop Operations"by Eric Sammer http://shop.oreilly.com/product/0636920025085.do
For a discussion about the Map/Reduce programming model, algorithms implementation, and its limitations, one of the best resource is "Data-Intensive Text Processing with MapReduce", by Jimmy Lin
  • asked a question related to Distributed Data Mining
Question
6 answers
What are the methods or best predictive methods to use for this kind of data?
Relevant answer
Answer
@Nentawe: Well, your log messages typically share common keywords. They need to be identified (manually). Every keyword will represent a component of a bit vector. If the keyword is present, then its component will be set to 1 and 0 otherwise. If you identify a good group of training instances, it should be possible to use classifiers, e.g. SVM, to learn a model.
With this technique every new log message can be first converted into a bit vector and eventually classified as error or normal behavior.