Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery

  • Anteneh Ayanso added an answer:
    How does mathematics and statistics explain the logic behind association rules which is essential for the decision making process?

    There are many different forms of mathematics. Different forms satisfies different understandings of association rules. Effectiveness and efficiency of same association rules are different in these mathematical explanations.  So, how is it possible that the association rules which has many different mathematical explanetions, may support a single decision? And how can I know which of these explanations are correct and which are wrong?

    Anteneh Ayanso · Brock University

    The key is to use interestingness measures that explain association between items or events beyond random chance, given that these items or events co-occur at an acceptable level of frequency(aka support ). Acceptable interestingness measures include confidence, lift, conviction, etc. 

  • Mohamed Mahmoud Hafez added an answer:
    What is the role of spatial data in mobile governance?
    I need good research articles on the role or use of spatial data in good/effective mobile governance?
    Mohamed Mahmoud Hafez · Cairo University
    I already worked on a mobile application project that solve some local problems using spatial data about the place of accidents, fire stations,...
    You can access my LinkedIn profile and take a look on the project entitled "Problem Locator"
  • Hanumantharao Jalla added an answer:
    Transformation technique to convert original data into perturbed data
    I need to transform original data into perturbed data to protect individual privacy then perform data mining technique on perturbed data , the knowledge equivalent to the original data knowledge
    Hanumantharao Jalla · University of Hyderabad
    Does Any body know how to run LDA(Latent Dirichlet Allocation ) Code for reueters21578 and news group20 datasets .

    from this link I downloaded LDA code
  • Ana-Maria Ciobotaru added an answer:
    Can anyone help me to trace the role of data mining & machine learning in Intelligent Traveler Management Systems (ITIS)?
    ITIS provide the drivers with real time travel and traffic information, such as transit routes, schedules, navigation directions and information about the delays due to congestion etc. I would like to know what the role of data mining and machine learning is in this regard. Can you recommend any literature?
    Ana-Maria Ciobotaru · University of Bucharest
    Hy, Budy Santoso, Look on the link where it is the book that you have requested ( Advanced Traveler Information System).
  • JC Ang added an answer:
    How does the Harmony search work for feature selection?
    I am a bit confused with the Harmony search for feature selection. Hopefully the expertise here can kindly help to answer my queries.
    Where should I fit the data set into theHarmony search? Can I say that in Harmony Memory (in step 2 initialise Harmony Memory) that the features are randomly selected from a dataset? How exactly is Harmony Memory the complete dataset?
    My understanding is that each row in Harmony Memory = feature subset, hence each decision variable in Harmony Memory represents a feature. So for the case of multi-dimensional datasets that have a few samples with various features (for example : Gene Expression data), how can Harmony Memory be formed?
    JC Ang · Universiti Teknologi Malaysia
    Thanks Bauckhage for the remind. This is really a interesting article. Somehow agree with the author that some metaphor for metaheuristics methods are too complex and unnecessary.. Consider to change my research direction then..
  • Emmanuel Detrinidad added an answer:
    How can small and medium enterprises benefit from Big Data and Data Science?
    Big Data and Data Science have continued to emerge among practitioners and researchers. But the foundation of these concepts involve large volumes and a variety of data created at high velocity. Hence, the focus have generally been on bigger organisations that generate such data. However, small and medium sized organisations are also active adopter of ICT. Can Big Data and Data Science benefit small and medium enterprises as well and how?
    Emmanuel Detrinidad · Instituto de Estudios Interdisciplinarios, Granada, Nicaragua
    I´m not sure about the use of "big data" in the SME. I mean if they (the CME) work independently the will need data mining tools. Think that will be enough.
    But if you are thinking in the coordination of a cluster of servers from a cluster of SME thats another story.
  • Jonas Mellin added an answer:
    Do you know of any articles about grouping events into sequences?
    I have a collection of events that occur in a tested system. I keep them in an oracle database. I am looking for algorithms that help me group those events into sequences and then use them to look for possible anomalies.
    Jonas Mellin · University of Skövde
    @Juggapong Natwichai, can the address method handle uncertainty?
  • Michael Niemann added an answer:
    How can I extract the adjective?
    The large photo album has extra charges on delivery.
    The adjective large may indicate an attribute size of the photo.
    Red car
    The adjective red may indicate an attribute colour of the photo.

    How can I write a Java program to implement the above job (How do we know the root for large, big is size) and how we know the root for the color?
    Michael Niemann · Monash University (Australia)
    This is some research you will have to do for yourself. Plenty of work has already been done with Wordnet, so there should be plenty of articles and books out there that will explain things for you.
  • How do you determine which objective function to use in the harmony search algorithm for feature selection?
    I am studying the harmony search for feature selection. I read some papers and have some doubts regarding the harmony search.
    How do you determine which objective function to use in the harmony search algorithm for feature selection?
    Félix F. González-Navarro · Autonomous University of Baja California
    i would say cross-validation...Also there is one thing that you should keep in mind. Harmony search is an stochastic algorithm, what this mean? From one run to another, the solution may be different. Picking the best solution could be a biassed solution. So how to solve it?, with several runs of your harmony search experiment. May be you could use bootstrap sampling and with each bootstrap sample apply your harmony search; at the end, some variables or features consistently will appear in the final subsets. Those variables or features are the interesting ones. Hope this help you.
  • Uwizeye Clarisse added an answer:
    How to predict an evolution of a network?
    I'm working on a data mining topic. I have an attributed graph of inflation network of developing countries. My vertex attributes are such as Population, Exchange rate to US $, Price level of consumption(PC), Ratio of GNP to GDP(%),.... The covariation among vertex descriptors can tell me whether the vertex attributes are structurally correlated or not? Is there any possibility to predict the future speed of my graph after a given period? If yes, what do my vertices stand for?
    Uwizeye Clarisse · University Joseph Fourier - Grenoble 1
    Ok If I got you well I think that I will get a kind of correlation patterns sequences.
    And every annual inflation is a function of its own history. So if I want to build a model that predict what comes at t+1 I feel lost for what stand for the time if I proceed that way
  • Vangipuram Radhakrishna added an answer:
    I need a good way to reduce dimension of my data?
    I have a data set of 500 records and 90 attributes. What are good ways to reduce the dimension without losing much information?
    I don't want to do something like PCA, because the interpretation of the model built on them would be hard. Do you suggest feature selection?
  • Madhavi Vaidya added an answer:
    Can I make use of R language for large data analysis?
    If yes, how is it different from MapReduce? What else can be used?
    Madhavi Vaidya · Vivekanand Education Society College of Arts, Science and Commerce
    Dear Alan, surely will check the link given. Thank you so much.
  • Nahid Khorashadizade added an answer:
    How are the most appropriate sub-features chosen?
    I have recently read about article
    "A hybrid decision support system based on rough set and extreme
    learning machine for diagnosis of hepatitis disease" and i am interested in its subject.
    I am working on this subject and having problems to implement Sub-feature sets obtained from RS method corresponding to table 3. I found many reductions. How can I get the listed attributes in table 3?
    How do you select the most appropriate sub-features (20 reduced sub-feature sets)?
    Can you please help me?

    I would appreciate your immediate attention on this matter.
    Nahid Khorashadizade · University of Sistan and Baluchestan
    Dear Mr Mahdavi
    thanks so much for your help.
    good luck
  • Mitali Sonar asked a question:
    What are utility measurements available in utility based privacy preserving methods?
    What are utility measures available in Utility based privacy preserving methods?
  • Ajay Parikh added an answer:
    Where can I get e-governance data for research purpose?
    I want to apply data mining techniques on e-governance data. Where can I get the sample data and related information from?
    Ajay Parikh · Gujarat Vidyapith - Ahmedabad
    What kind of data (govt) you want , election data available on election site like ....
  • Juggapong Natwichai added an answer:
    Does anybody have experience with data mining over big data?
    I would like to work in optimization of algorithms to classification, clustering and logistic regression for data mining over big data.
    Juggapong Natwichai · Chiang Mai University
    I believe that array processing is a crucial part of data mining algorithms. So, check SciQL at

    It can bridge the gap of data storage. In big data, you probably can't store everything in the memory. But storing the data in the file could be really painful in production.
  • Shankar K (d) added an answer:
    Can any one give the list of computer science free journal names/links which are indexed by Scopus/IEEE/ACM/Elsevier?
    This may help us to publish our work on high impact factor journals.
    Shankar K (d) · Alagappa University

    Annexure journals are very H-index and impact factor from multiple journals.
    some journals are free and review process take less then 3 months. my recommendation is find your area of specialization journals(ex: your research area is image processing, International Journal of Computer Vision ). I hope it is very useful to you....
  • Shankar K (d) added an answer:
    Is there any free tool to perform preprocessing of a server log file?
    I am working on web log mining processes. I need a tool which performs pre-processing (data cleaning, user identification, sesion identification) of server log file.
    Shankar K (d) · Alagappa University

    read this article... it is very easy understandable and useful to what you need...
  • Mahmoud Omid added an answer:
    Are these feature selection methods the state of the art?
    There is no widely-recommended technique for feature selection. Commonly used techniques applicable for image processing purposes are PCA, correlation-based, GA and sensitivity analysis. This is debatable. Are these feature selection methods the state of the art? Are there any other feature selection methods more commonly used are left out?
    Mahmoud Omid · University of Tehran
    Dear @Muhammad
    Thank you for answering. The above Link you provided does not work. Could you please send title and authors information. Thanks,
  • Ali Sajedi added an answer:
    Can anyone recommend a course specifically designed for going into the big data business?
    Many researchers come to big data from other areas but are not specifically "tagged" for big data jobs or research. Any angle of treating big aata sciences in a higher educational institute would be appropriate.
    Ali Sajedi · University of Alberta
    Another Coursera MOOC that may be suitable for you:
    Big Data in Education
  • Olatundun Oyewumi added an answer:
    Can anyone suggest an AutoMap (CMU) "Scripting" and "Ontology Learning" tutorial?
    I am a beginner user of AutoMap/ORA, I want to know how use scripts for any text file. I'd be grateful if anyone provide me some tutorial links on how to use AutoMap scripts and Ontology Learning. Unable to get enough from its User Guide.
    Olatundun Oyewumi · Ladoke Akintola University of Technology

    This is not my area

  • Michael Wendl added an answer:
    Which approaches are suitable for preserving the genomic privacy?
    Why anonymization and de-identification models are useless in genomic data privacy?Anonymization techniques are used for preserve the individuals privacy. In genomic data privacy it is difficult to preserve the genomic privacy for anonymization techniques like suppression,generalization,etc...
    Michael Wendl · Washington University in St. Louis
    Agreed. I would point to a paper from some years ago by Russ Altman and colleagues that shows only around a few dozen independent SNPs furnish enough information to identify an individual
  • Sudhakar Singh added an answer:
    Where to find Big Relational or Transactional Data Sets?
    I searching the sources which freely provide the big data sets specially Transactional data sets.
    Sudhakar Singh · Banaras Hindu University
    Thanks Mr. RIOBT, can you please add here the URLs
  • Ioannis T. Christou added an answer:
    What is the suggested method to predict the effect of price change on sales when the price-change is happening for the first time?
    Price-elasticity models do not fit here (since they need previous/historic data with price changes to 'learn' the effect).

    We have sales-data for different items (with no price-changes till now). Business owners are thinking of changing the prices for the first time, and would like to estimate the customer reaction. (Customers are not tracked, anonymous and opportunistic).

    Any pointers on the right approach (R package) ?
    Ioannis T. Christou · Athens Information Technology
    without any further information, there is no way to accurately forecast how the volume of sales will change when the price changes.
    So, there are two things you can do:
    1. if you have substitute products for which you know their elasticity, you can extrapolate the sales volume of your product in question based on the computed elasticity of the other substitute product.
    2. If you don't have data for substitute products (or, you don't think good substitutes exist in your market), then you can perform a small market-research, i.e. ask a number of customers if they would be willing to buy the product in question at a smaller or higher price, and compute demand elasticity from this market research.
    3. Or, you may try limited-time only incremental discounts, and see how the market moves, and based on this information, compute the elasticity of your product.
    4. In general, if your sales seem to be declining as time goes by, it's a good indication that your price is too high for the current market segment you have targeted, and you need to lower prices. On the other hand, discounts, if done carelessly, will only "buy sales", but may very well cause you to actually lose money -maybe due to logistics and/or supply chain management issues that your sales campaign will bring forth, and so on... there are several studies that have documented this effect.

    None of the above is related to R, or MATLAB, or Java, or FORTRAN, as they are simply methods you have to resort to, to compute an approximation of the product's elasticity. Also you need to remember that elasticity is not a static concept, but varies with time, and many other parameters. What may seem as a good bargain price to your customers today, may seem way too much to them tomorrow, so this information needs to be constantly updated.
  • Lars Taxén added an answer:
    Do you think cognitive states and cognitive inputs can be retrieved from information systems?
    Information and knowledge discovery systems are often working on predefined algorithms and constructs on the data sets. Can we extract the historicity and context of cognition behind the data and the relationships.
    Lars Taxén · Linköping University
    To me, cognitive states are entirely confined to our inner workings of our brain. Thus, my answer is no, this is not possible to do.
  • Nazeeh A. Ghatasheh added an answer:
    What tests are required when preparing the data and testing the data using a clinical data set?
    Mostly we could do these tests at an initial stage like pre-process or analysis of data. Also please can you suggest the tests required at various stages with an example or give the web references for understanding the importance of those tests.
    Nazeeh A. Ghatasheh · University of Jordan
    It depends on the type of analysis you intened to perform. But mainly you need to handle missing data (there are many options regarding missing data).

    After (if applicable) you may eliminate noise sources, for example irrelevant records.

    Data normalization or quantification could be required.

    It might be useful to analyze tha data before conducting the tests, some of are correlation analysis, variability, etc.

    Also (depending on the test you intend to conduct and the data) dimensionality reduction may improve the performance.

    This was a quick overview of the pre-processing phase.

    Try to search for data pre-processing approaches or data mining related materials.
  • Mahboobeh Parsapoor added an answer:
    What should be the minimum and maximum dataset size in the area of Datamining research? Which part determines size of dataset?
    Please suggest it for M.Phil., / Ph.D., level work. Also suggest some tutorials for understanding the complete use of data set in research.
  • Osman Ibrahim added an answer:
    What is the free, alternative document collection to TREC?
    I know there are textual document collections for IR but these are not free, such as TREC datasets and others. I need a dataset (Textual documents collection) that meets the requirements of Information Retrieval Research and at the same time, should be large enough and include most of features of micro and macro variations.
    Osman Ibrahim · Minia University
    Thank you but this .csv files. Please, I am wonder if you know source for text documents data (document corpus)

About Data Mining and Knowledge Discovery

It is the research project which is ongoing.

Topic Followers (11206) See all