Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery

  • Afaq Ahmad added an answer:
    Are PhD/Masters Theses and Dissertations or Journals/Conference proceedings the best sources for a Literature Review? Why?
    When reviewing the literature for trends in technological advancement and future directions, we may use either PhD and Masters' theses and dissertations. But which will be useful, or give the most rigorous review?
    Afaq Ahmad

    Dear John K. Marco Pima

    PhD, Masters: Theses and Dissertations provide us a quick access to methodologies and tools along with scope of future works. Since these documents are quickly available for research groups at same institute. Hence, referring these are essential if research idea/methodology/tool/comparison benchmark is used.    

    Since Theses and Dissertations are not accessible or publicized immediately therefore, it is not commonly referred by other researchers of the same field. Because most of the the research papers are from the Theses' and Dissertations' work, so generally, referring of these are not prime issue. 

    Conference papers are commonly discusses new ideas, tools, methodologies, technologies and comments but not to the extents of completeness. Some good conferences adhere a standard like IEEExplore. Standard conference papers and /or conference papers designed from same research groups / fields are justified for referencing.

    Standard journal papers are the best source of breakthrough of your research and make a practice of it and therefore, this the best source of reference. 

  • Sanjay Garg added an answer:
    Is there any effective algorithm known for applying rule hiding techniques on temporal multilevel association rules?

    We are working on privacy preserving issues in temporal multilevel association mining and want to know which is most effective algorithm in practice/real deployment/research for the same purpose at present.

    Sanjay Garg

    Mr. Robert,

    Thanks for giving this answer, your suggestion is really useful to me.

  • Prashant Kumar added an answer:
    Where can I find huge data sets for mining frequent item sets in data mining.
    I am working on Distributed Association Rule Mining. I need data sets to simulate my program on it.
    Prashant Kumar (UC Irvine Machine Learning Repository) (Case Western University)

    + 4 more attachments

  • Remy Fannader added an answer:
    Is it possible to explain the operational/tactical/strategic classification in terms of real-time KM ?

    The basic idea is to use events and the processing of associated data to bridge the gap between operational and decision-making systems.

    Remy Fannader

    I will assume that internal events are meant to be managed, which means that risks are by definition external. Then I will use the classic distinction:

    Operational: full information on external state of affairs allows for immediate appraisal of prospective states.

    Tactical: partially defined external state of affairs allow for periodic appraisal of prospective states in synch with production cycles.

    Strategic: undefined defined external state of affairs don’t allow for periodic appraisal of prospective states in synch with production cycles; their definition may also be affected through feedback.

  • Ibrahim Abubakar added an answer:
    How do you classify documents using WEKA?

    Please I need help on how to go about classifying documents using WEKA. I am doing classification of dissertations of my department. The assignment is to do an Ontology-based classification of 250 dissertations against Gartner's Hype cycle. The aim is to determine which field of IT each dissertation falls into.


    Ibrahim Abubakar

    thank you. It really helps. I will try using some of the suggested classifiers.

    Thank you 

  • Marc Scheibel added an answer:
    Can anyone help to determine the time lag between flood gauges from upstream to downstream?
    Can anyone suggest how we should calculate time lags of historical flow data between upstream and downstream gauges for urban flood prediction (considering that for various flood events the time lags between upstream and downstream flow stations can change depends on precipitation characteristics and and hydrologic condition of catchments)?
    Marc Scheibel

    There are flood routing methods like Muskingum and Kalinin-Milijukov, in which you can estimate the translation and the retention effects in the river section.

  • Ian Kennedy added an answer:
    Domain specific search?
    I have been guiding research scholars in the area of Domain Specific Ontology Searching Techniques. If any one would like to share your expertise with published papers, I will really appreciate that.
    Ian Kennedy

    We have a paper in press where we used an exemplary glossary as the basis for extracting the ontology of a domain. (We were then able to establish the hierarchy of dependencies.)

  • Mohamed Mohsen Gammoudi added an answer:
    How can one filter uninteresting rules in multilevel association mining processes?

    During any association mining process it is a big challenge to remove uninteresting rules. We are interested in effective formal and experimental method for finding interestingness of the multilevel rules.

    Mohamed Mohsen Gammoudi

    You could read this paper:

    May be it helps you

  • Dr. Indrajit Mandal added an answer:
    Is there any way to do manual prunning over the result (decision trees) of a trained Random Forest Model?

    The idea is to do an online pruning using a continuous timestamped dataset...and I wanted to train my model using some data then improve it during the day with some other information that I may receive (i.e. active learning, weather condition, etc.). It could be by prunning (i.e. removing some tree brunches) or by adding more branches to the current leaves. Are there any R packages that support such an implementation? Which would be the best way to do so? Thank you in advance.

    Dr. Indrajit Mandal

    hello friend

    Yes its possible .

    if the dataset contains less number of attributes.

    else it may become highly tedious job.

    You can see my7 publications for more details about application of trees:

    Mandal, I., Sairam, N. New machine-learning algorithms for prediction of Parkinson's disease (2014) International Journal of Systems Science, 45 (3), pp. 647-666. DOI: 10.1080/00207721.2012.724114

    Mandal, I., Sairam, N. Accurate telemonitoring of Parkinson's disease diagnosis using robust inference system (2013) International Journal of Medical Informatics, 82 (5), pp. 359-377. DOI: 10.1016/j.ijmedinf.2012.10.006

    Mandal, I., Sairam, N. Accurate prediction of coronary artery disease using reliable diagnosis system (2012) Journal of Medical Systems, 36 (5), pp. 3353-3373. DOI: 10.1007/s10916-012-9828-0

    Mandal, I., Sairam, N. Enhanced classification performance using computational intelligence (2011) Communications in Computer and Information Science, 204 CCIS, pp. 384-391. DOI: 10.1007/978-3-642-24043-0_39

    Mandal, I. Software reliability assessment using artificial neural network (2010) ICWET 2010 - International Conference and Workshop on Emerging Trends in Technology 2010, Conference Proceedings, pp. 698-699. DOI: 10.1145/1741906.1742067

    Mandal, I. A low-power content-addressable memory (CAM) using pipelined search scheme (2010) ICWET 2010 - International Conference and Workshop on Emerging Trends in Technology 2010, Conference Proceedings, pp. 853-858. DOI: 10.1145/1741906.1742103

    Indrajit Mandal, Developing New Machine Learning Ensembles for Quality Spine Diagnosis, Knowledge-Based Systems, Available online 19 October 2014, ISSN 0950-7051,

    Mandal, I. A novel approach for accurate identification of splice junctions based on hybrid algorithms (2014) Journal of Biomolecular Structure and Dynamics
    pp. 1-10 | DOI: 10.1080/07391102.2014.944218 PMID: 25203504

    Thank you.

    Best regards,

    Dr.Indrajit mandal

  • Surya Narayan Panda added an answer:
    What kind of "Research Tools" have you used during your PhD journey?
    I have collected over 700 tools that can help researchers do their work efficiently. It is assembled as an interactive Web-based mind map, titled "Research Tools", which is updated periodically. I would like to know your personal experience on any kind of "Research Tools" that you are using for facilitating your research.
    Surya Narayan Panda

    using google scholar, google patents,  IEEE, Springer etc

  • Peter Kokol added an answer:
    Do anyone know where to find links to public nursing data-sets?
    We are performing knowledge extraction from various health care data-sets found just one nursing data set, but we need more.
    Peter Kokol

    Thank you so much, the links are very helpfull

  • Brian Prasad added an answer:
    What techniques do you consider the most effective for capturing heuristics rules and best-practices during product design & development and why?
    Engineers often like to use "best practices" (data, information, knowledge, wisdom) during product development. Some of the data/information come from their experiences working on the job. Others (best practices) are derived from analytical, functional, logical or physical phenomena.
    Brian Prasad


    Is your article listed on ResearchGate or on any web site? Can you add a link to the article you have refereed in here?

  • Sanjay Garg added an answer:
    Is OOB error rate always random?

    Bootstrapped sample and a tree built using the same. Why is it that OOB error estimate  a random value?

    Sanjay Garg

      OOB predictions are based on a small subsample of the trees in the forest and are thus at a disadvantage relative to predictions that can be legitimately based on the entire forest. 

    OOB predictions are not expected to exhibit  better performance because their distribution is not the same as the distribution on which the individual trees are grown

    Advantage of randomness is obviously to get better ensemble.following article may help you in this regard

  • Johan A.K. Suykens added an answer:
    How can I classify data with missing values using conventional classifiers?

    The tolerance rough set model is a perfect model that deal with missing values in the data sets. But how can I use the tolerance rough set model to classify data using the conventional classifier as KNN?

    Johan A.K. Suykens

    Dear all,

    In the context of support vector machines it is possible to also change the problem formulation in the case of missing values, see e.g.

    Pelckmans K., De Brabanter J., Suykens J.A.K, De Moor B., ``Handling Missing Values in Support Vector Machine Classifiers'', Neural Networks, vol. 18, 2005, pp. 684-692.

    Best regards,

    Johan Suykens


    Prof. Johan Suykens
    Katholieke Universiteit Leuven
    Departement Elektrotechniek - ESAT-STADIUS
    Kasteelpark Arenberg 10
    B-3001 Leuven (Heverlee)

  • Eldar Sultanow added an answer:
    Why knowledge is not highly valued as a decisive element to sustain competitive organizational position?
    I drew your attention to 5 organizational elements as presented by D. McFarland from Stanford University in his Organizational Analysis MOOC Class. Technology, Participants, Goals, Social Structure, and Environment. Drucker cosiders knowledge as the primary resource, and land, labor, and capital as secondary resources. General categorization are physical, social and technical.
    Eldar Sultanow

    Our study has shown, that on the management level of organizations, knowledge (especially epistemologically progressed knowledge that is decision relevant) is valued very high. This applies not to the operational level of organizations.

    Another phenomenon is, that the more knowledge is tacit and bound to persons, the more it has been highly valued.

  • Tarik A. Rashid added an answer:
    May I know what is the difference of global and local search in term of machine learning and computer science?
    Global search vs local search.
    Tarik A. Rashid

    Ideally speaking, a global searching technique is promising to make sure to find the best global formation but this is achieved most ly at the cost of a long time searching. But then again in reality, they are run and stop when stopping criterion is come across. Examples of this search include, particle swarm optimization. simulated annealing, and genetic algorithm.  Whereas, local search algorithms do not totally focus on search and but it attempts to move from a current formation to a neighboring refining formation. This is much depending on the initial search space and initial formation. An example of local search is hill climbing, which is an iterative algorithm which can start with a random solution and then after the algorithm tries to find a better solution by incrementally altering a solution of single element. If this alteration harvests a better solution, an incremental alteration is made to a new solution, this process can be repeated in anticipation of no more enhancement is identified.

  • Bharat Singh added an answer:
    What is the significant and future scope of High-Dimensional Data?
    High Dimensional Microarray data.
    Bharat Singh

    Eirini Ntoutsi........... thanks you mam 

    may you please tell me, how high-dimensional affect the data mining techniques, and what are the other methods which we can apply on it to mitigate high dimensionality and do efficient data analysis 

  • Vipul Sharma added an answer:
    How do you avoid the curse of dimensionality problems during the feature reduction step in data mining?
    How do you know if the number of features reduced is sufficient? Is there any rule of thumb for it? Any good idea in this direction is highly appreciated.
    Vipul Sharma


    May be chapter-4 of below book will  help you:

    Book: An introduction to pattern recognition: A MATLAB approach

    BY: Theodoridis, Sergios

    Gud luck...!!!!

  • Anteneh Ayanso added an answer:
    How does mathematics and statistics explain the logic behind association rules which is essential for the decision making process?

    There are many different forms of mathematics. Different forms satisfies different understandings of association rules. Effectiveness and efficiency of same association rules are different in these mathematical explanations.  So, how is it possible that the association rules which has many different mathematical explanetions, may support a single decision? And how can I know which of these explanations are correct and which are wrong?

    Anteneh Ayanso

    The key is to use interestingness measures that explain association between items or events beyond random chance, given that these items or events co-occur at an acceptable level of frequency(aka support ). Acceptable interestingness measures include confidence, lift, conviction, etc. 

  • Mohamed Hafez AbdElRahman added an answer:
    What is the role of spatial data in mobile governance?
    I need good research articles on the role or use of spatial data in good/effective mobile governance?
    Mohamed Hafez AbdElRahman
    I already worked on a mobile application project that solve some local problems using spatial data about the place of accidents, fire stations,...
    You can access my LinkedIn profile and take a look on the project entitled "Problem Locator"
  • Hanumantharao Jalla added an answer:
    Transformation technique to convert original data into perturbed data
    I need to transform original data into perturbed data to protect individual privacy then perform data mining technique on perturbed data , the knowledge equivalent to the original data knowledge
    Hanumantharao Jalla
    Does Any body know how to run LDA(Latent Dirichlet Allocation ) Code for reueters21578 and news group20 datasets .

    from this link I downloaded LDA code
  • Ana-Maria Ciobotaru added an answer:
    Can anyone help me to trace the role of data mining & machine learning in Intelligent Traveler Management Systems (ITIS)?
    ITIS provide the drivers with real time travel and traffic information, such as transit routes, schedules, navigation directions and information about the delays due to congestion etc. I would like to know what the role of data mining and machine learning is in this regard. Can you recommend any literature?
    Ana-Maria Ciobotaru
    Hy, Budy Santoso, Look on the link where it is the book that you have requested ( Advanced Traveler Information System).
  • Emmanuel Detrinidad added an answer:
    How can small and medium enterprises benefit from Big Data and Data Science?
    Big Data and Data Science have continued to emerge among practitioners and researchers. But the foundation of these concepts involve large volumes and a variety of data created at high velocity. Hence, the focus have generally been on bigger organisations that generate such data. However, small and medium sized organisations are also active adopter of ICT. Can Big Data and Data Science benefit small and medium enterprises as well and how?
    Emmanuel Detrinidad
    I´m not sure about the use of "big data" in the SME. I mean if they (the CME) work independently the will need data mining tools. Think that will be enough.
    But if you are thinking in the coordination of a cluster of servers from a cluster of SME thats another story.
  • Michael Niemann added an answer:
    How can I extract the adjective?
    The large photo album has extra charges on delivery.
    The adjective large may indicate an attribute size of the photo.
    Red car
    The adjective red may indicate an attribute colour of the photo.

    How can I write a Java program to implement the above job (How do we know the root for large, big is size) and how we know the root for the color?
    Michael Niemann
    This is some research you will have to do for yourself. Plenty of work has already been done with Wordnet, so there should be plenty of articles and books out there that will explain things for you.
  • Dennis Weyland added an answer:
    How do you determine which objective function to use in the harmony search algorithm for feature selection?
    I am studying the harmony search for feature selection. I read some papers and have some doubts regarding the harmony search.
    How do you determine which objective function to use in the harmony search algorithm for feature selection?
    Dennis Weyland

    If you work with harmony search, you should maybe know about the fact that harmony search is in fact a special case of evolution strategies and that some results reported by the "inventor" of harmony search, Z.W. Geem, seem extremely unlikely:

About Data Mining and Knowledge Discovery

It is the research project which is ongoing.

Topic followers (11,348) See all