Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery

  • Remy Fannader added an answer:
    Is it possible to explain the operational/tactical/strategic classification in terms of real-time KM ?

    The basic idea is to use events and the processing of associated data to bridge the gap between operational and decision-making systems.

    Remy Fannader · caminao

    I will assume that internal events are meant to be managed, which means that risks are by definition external. Then I will use the classic distinction:

    Operational: full information on external state of affairs allows for immediate appraisal of prospective states.

    Tactical: partially defined external state of affairs allow for periodic appraisal of prospective states in synch with production cycles.

    Strategic: undefined defined external state of affairs don’t allow for periodic appraisal of prospective states in synch with production cycles; their definition may also be affected through feedback.

  • Ibrahim Abubakar added an answer:
    How do you classify documents using WEKA?

    Please I need help on how to go about classifying documents using WEKA. I am doing classification of dissertations of my department. The assignment is to do an Ontology-based classification of 250 dissertations against Gartner's Hype cycle. The aim is to determine which field of IT each dissertation falls into.


    Ibrahim Abubakar · Heriot-Watt University

    thank you. It really helps. I will try using some of the suggested classifiers.

    Thank you 

  • Marc Scheibel added an answer:
    Can anyone help to determine the time lag between flood gauges from upstream to downstream?
    Can anyone suggest how we should calculate time lags of historical flow data between upstream and downstream gauges for urban flood prediction (considering that for various flood events the time lags between upstream and downstream flow stations can change depends on precipitation characteristics and and hydrologic condition of catchments)?
    Marc Scheibel · Wupperverband

    There are flood routing methods like Muskingum and Kalinin-Milijukov, in which you can estimate the translation and the retention effects in the river section.

  • Ian Kennedy added an answer:
    Domain specific search?
    I have been guiding research scholars in the area of Domain Specific Ontology Searching Techniques. If any one would like to share your expertise with published papers, I will really appreciate that.
    Ian Kennedy · Independent Researcher

    We have a paper in press where we used an exemplary glossary as the basis for extracting the ontology of a domain. (We were then able to establish the hierarchy of dependencies.)

  • Mohamed Mohsen Gammoudi added an answer:
    How can one filter uninteresting rules in multilevel association mining processes?

    During any association mining process it is a big challenge to remove uninteresting rules. We are interested in effective formal and experimental method for finding interestingness of the multilevel rules.

    Mohamed Mohsen Gammoudi · Université de la Manouba

    You could read this paper:

    May be it helps you

  • Dr. Indrajit Mandal added an answer:
    Is there any way to do manual prunning over the result (decision trees) of a trained Random Forest Model?

    The idea is to do an online pruning using a continuous timestamped dataset...and I wanted to train my model using some data then improve it during the day with some other information that I may receive (i.e. active learning, weather condition, etc.). It could be by prunning (i.e. removing some tree brunches) or by adding more branches to the current leaves. Are there any R packages that support such an implementation? Which would be the best way to do so? Thank you in advance.

    Dr. Indrajit Mandal · Rajiv Gandhi Institute of Technology, Bangalore

    hello friend

    Yes its possible .

    if the dataset contains less number of attributes.

    else it may become highly tedious job.

    You can see my7 publications for more details about application of trees:

    Mandal, I., Sairam, N. New machine-learning algorithms for prediction of Parkinson's disease (2014) International Journal of Systems Science, 45 (3), pp. 647-666. DOI: 10.1080/00207721.2012.724114

    Mandal, I., Sairam, N. Accurate telemonitoring of Parkinson's disease diagnosis using robust inference system (2013) International Journal of Medical Informatics, 82 (5), pp. 359-377. DOI: 10.1016/j.ijmedinf.2012.10.006

    Mandal, I., Sairam, N. Accurate prediction of coronary artery disease using reliable diagnosis system (2012) Journal of Medical Systems, 36 (5), pp. 3353-3373. DOI: 10.1007/s10916-012-9828-0

    Mandal, I., Sairam, N. Enhanced classification performance using computational intelligence (2011) Communications in Computer and Information Science, 204 CCIS, pp. 384-391. DOI: 10.1007/978-3-642-24043-0_39

    Mandal, I. Software reliability assessment using artificial neural network (2010) ICWET 2010 - International Conference and Workshop on Emerging Trends in Technology 2010, Conference Proceedings, pp. 698-699. DOI: 10.1145/1741906.1742067

    Mandal, I. A low-power content-addressable memory (CAM) using pipelined search scheme (2010) ICWET 2010 - International Conference and Workshop on Emerging Trends in Technology 2010, Conference Proceedings, pp. 853-858. DOI: 10.1145/1741906.1742103

    Indrajit Mandal, Developing New Machine Learning Ensembles for Quality Spine Diagnosis, Knowledge-Based Systems, Available online 19 October 2014, ISSN 0950-7051,

    Mandal, I. A novel approach for accurate identification of splice junctions based on hybrid algorithms (2014) Journal of Biomolecular Structure and Dynamics
    pp. 1-10 | DOI: 10.1080/07391102.2014.944218 PMID: 25203504

    Thank you.

    Best regards,

    Dr.Indrajit mandal

  • Surya Narayan Panda added an answer:
    What kind of "Research Tools" have you used during your PhD journey?
    I have collected over 700 tools that can help researchers do their work efficiently. It is assembled as an interactive Web-based mind map, titled "Research Tools", which is updated periodically. I would like to know your personal experience on any kind of "Research Tools" that you are using for facilitating your research.
    Surya Narayan Panda · Chitkara University

    using google scholar, google patents,  IEEE, Springer etc

  • Peter Kokol added an answer:
    Do anyone know where to find links to public nursing data-sets?
    We are performing knowledge extraction from various health care data-sets found just one nursing data set, but we need more.
    Peter Kokol · University of Maribor

    Thank you so much, the links are very helpfull

  • Brian Prasad added an answer:
    What techniques do you consider the most effective for capturing heuristics rules and best-practices during product design & development and why?
    Engineers often like to use "best practices" (data, information, knowledge, wisdom) during product development. Some of the data/information come from their experiences working on the job. Others (best practices) are derived from analytical, functional, logical or physical phenomena.
    Brian Prasad · California Institute of Technology


    Is your article listed on ResearchGate or on any web site? Can you add a link to the article you have refereed in here?

  • Sanjay Garg added an answer:
    Is OOB error rate always random?

    Bootstrapped sample and a tree built using the same. Why is it that OOB error estimate  a random value?

    Sanjay Garg · Nirma University

      OOB predictions are based on a small subsample of the trees in the forest and are thus at a disadvantage relative to predictions that can be legitimately based on the entire forest. 

    OOB predictions are not expected to exhibit  better performance because their distribution is not the same as the distribution on which the individual trees are grown

    Advantage of randomness is obviously to get better ensemble.following article may help you in this regard

  • Johan A.K. Suykens added an answer:
    How can I classify data with missing values using conventional classifiers?

    The tolerance rough set model is a perfect model that deal with missing values in the data sets. But how can I use the tolerance rough set model to classify data using the conventional classifier as KNN?

    Johan A.K. Suykens ·

    Dear all,

    In the context of support vector machines it is possible to also change the problem formulation in the case of missing values, see e.g.

    Pelckmans K., De Brabanter J., Suykens J.A.K, De Moor B., ``Handling Missing Values in Support Vector Machine Classifiers'', Neural Networks, vol. 18, 2005, pp. 684-692.

    Best regards,

    Johan Suykens


    Prof. Johan Suykens
    Katholieke Universiteit Leuven
    Departement Elektrotechniek - ESAT-STADIUS
    Kasteelpark Arenberg 10
    B-3001 Leuven (Heverlee)

  • Ana-Maria Ciobotaru added an answer:
    Data Mining and Risk Management
    Hi! I am starting my batchelor thesis, and the subject is a combination between data mining and Risk Management. Question : How can data mining support risk management in financial institutions in order to remain competitive on the market in times of economic crisis? It not easy to come accross valuable information over the net. I want to take an holistic approach and use a number of local banks to conduct a qualitative data collection research.
    Ana-Maria Ciobotaru · University of Bucharest

    Hy, you can look on this papers :

  • Eldar Sultanow added an answer:
    Why knowledge is not highly valued as a decisive element to sustain competitive organizational position?
    I drew your attention to 5 organizational elements as presented by D. McFarland from Stanford University in his Organizational Analysis MOOC Class. Technology, Participants, Goals, Social Structure, and Environment. Drucker cosiders knowledge as the primary resource, and land, labor, and capital as secondary resources. General categorization are physical, social and technical.
    Eldar Sultanow · Universität Potsdam

    Our study has shown, that on the management level of organizations, knowledge (especially epistemologically progressed knowledge that is decision relevant) is valued very high. This applies not to the operational level of organizations.

    Another phenomenon is, that the more knowledge is tacit and bound to persons, the more it has been highly valued.

  • Tarik A. Rashid added an answer:
    May I know what is the difference of global and local search in term of machine learning and computer science?
    Global search vs local search.
    Tarik A. Rashid · Salahaddin University - Erbil

    Ideally speaking, a global searching technique is promising to make sure to find the best global formation but this is achieved most ly at the cost of a long time searching. But then again in reality, they are run and stop when stopping criterion is come across. Examples of this search include, particle swarm optimization. simulated annealing, and genetic algorithm.  Whereas, local search algorithms do not totally focus on search and but it attempts to move from a current formation to a neighboring refining formation. This is much depending on the initial search space and initial formation. An example of local search is hill climbing, which is an iterative algorithm which can start with a random solution and then after the algorithm tries to find a better solution by incrementally altering a solution of single element. If this alteration harvests a better solution, an incremental alteration is made to a new solution, this process can be repeated in anticipation of no more enhancement is identified.

  • Bharat Singh added an answer:
    What is the significant and future scope of High-Dimensional Data?
    High Dimensional Microarray data.
    Bharat Singh · Indian Institute of Information Technology Allahabad

    Eirini Ntoutsi........... thanks you mam 

    may you please tell me, how high-dimensional affect the data mining techniques, and what are the other methods which we can apply on it to mitigate high dimensionality and do efficient data analysis 

  • Vipul Sharma added an answer:
    How do you avoid the curse of dimensionality problems during the feature reduction step in data mining?
    How do you know if the number of features reduced is sufficient? Is there any rule of thumb for it? Any good idea in this direction is highly appreciated.
    Vipul Sharma · DAV University


    May be chapter-4 of below book will  help you:

    Book: An introduction to pattern recognition: A MATLAB approach

    BY: Theodoridis, Sergios

    Gud luck...!!!!

  • Anteneh Ayanso added an answer:
    How does mathematics and statistics explain the logic behind association rules which is essential for the decision making process?

    There are many different forms of mathematics. Different forms satisfies different understandings of association rules. Effectiveness and efficiency of same association rules are different in these mathematical explanations.  So, how is it possible that the association rules which has many different mathematical explanetions, may support a single decision? And how can I know which of these explanations are correct and which are wrong?

    Anteneh Ayanso · Brock University

    The key is to use interestingness measures that explain association between items or events beyond random chance, given that these items or events co-occur at an acceptable level of frequency(aka support ). Acceptable interestingness measures include confidence, lift, conviction, etc. 

  • Mohamed Mahmoud Hafez added an answer:
    What is the role of spatial data in mobile governance?
    I need good research articles on the role or use of spatial data in good/effective mobile governance?
    Mohamed Mahmoud Hafez · Rice University
    I already worked on a mobile application project that solve some local problems using spatial data about the place of accidents, fire stations,...
    You can access my LinkedIn profile and take a look on the project entitled "Problem Locator"
  • Hanumantharao Jalla added an answer:
    Transformation technique to convert original data into perturbed data
    I need to transform original data into perturbed data to protect individual privacy then perform data mining technique on perturbed data , the knowledge equivalent to the original data knowledge
    Hanumantharao Jalla · University of Hyderabad
    Does Any body know how to run LDA(Latent Dirichlet Allocation ) Code for reueters21578 and news group20 datasets .

    from this link I downloaded LDA code
  • Ana-Maria Ciobotaru added an answer:
    Can anyone help me to trace the role of data mining & machine learning in Intelligent Traveler Management Systems (ITIS)?
    ITIS provide the drivers with real time travel and traffic information, such as transit routes, schedules, navigation directions and information about the delays due to congestion etc. I would like to know what the role of data mining and machine learning is in this regard. Can you recommend any literature?
    Ana-Maria Ciobotaru · University of Bucharest
    Hy, Budy Santoso, Look on the link where it is the book that you have requested ( Advanced Traveler Information System).
  • JC Ang added an answer:
    How does the Harmony search work for feature selection?
    I am a bit confused with the Harmony search for feature selection. Hopefully the expertise here can kindly help to answer my queries.
    Where should I fit the data set into theHarmony search? Can I say that in Harmony Memory (in step 2 initialise Harmony Memory) that the features are randomly selected from a dataset? How exactly is Harmony Memory the complete dataset?
    My understanding is that each row in Harmony Memory = feature subset, hence each decision variable in Harmony Memory represents a feature. So for the case of multi-dimensional datasets that have a few samples with various features (for example : Gene Expression data), how can Harmony Memory be formed?
    JC Ang · Universiti Teknologi Malaysia
    Thanks Bauckhage for the remind. This is really a interesting article. Somehow agree with the author that some metaphor for metaheuristics methods are too complex and unnecessary.. Consider to change my research direction then..
  • Emmanuel Detrinidad added an answer:
    How can small and medium enterprises benefit from Big Data and Data Science?
    Big Data and Data Science have continued to emerge among practitioners and researchers. But the foundation of these concepts involve large volumes and a variety of data created at high velocity. Hence, the focus have generally been on bigger organisations that generate such data. However, small and medium sized organisations are also active adopter of ICT. Can Big Data and Data Science benefit small and medium enterprises as well and how?
    Emmanuel Detrinidad · Instituto de Estudios Interdisciplinarios, Granada, Nicaragua
    I´m not sure about the use of "big data" in the SME. I mean if they (the CME) work independently the will need data mining tools. Think that will be enough.
    But if you are thinking in the coordination of a cluster of servers from a cluster of SME thats another story.
  • Jonas Mellin added an answer:
    Do you know of any articles about grouping events into sequences?
    I have a collection of events that occur in a tested system. I keep them in an oracle database. I am looking for algorithms that help me group those events into sequences and then use them to look for possible anomalies.
    Jonas Mellin · University of Skövde
    @Juggapong Natwichai, can the address method handle uncertainty?
  • Michael Niemann added an answer:
    How can I extract the adjective?
    The large photo album has extra charges on delivery.
    The adjective large may indicate an attribute size of the photo.
    Red car
    The adjective red may indicate an attribute colour of the photo.

    How can I write a Java program to implement the above job (How do we know the root for large, big is size) and how we know the root for the color?
    Michael Niemann · Monash University (Australia)
    This is some research you will have to do for yourself. Plenty of work has already been done with Wordnet, so there should be plenty of articles and books out there that will explain things for you.
  • Félix F. González-Navarro added an answer:
    How do you determine which objective function to use in the harmony search algorithm for feature selection?
    I am studying the harmony search for feature selection. I read some papers and have some doubts regarding the harmony search.
    How do you determine which objective function to use in the harmony search algorithm for feature selection?
    Félix F. González-Navarro · Autonomous University of Baja California
    i would say cross-validation...Also there is one thing that you should keep in mind. Harmony search is an stochastic algorithm, what this mean? From one run to another, the solution may be different. Picking the best solution could be a biassed solution. So how to solve it?, with several runs of your harmony search experiment. May be you could use bootstrap sampling and with each bootstrap sample apply your harmony search; at the end, some variables or features consistently will appear in the final subsets. Those variables or features are the interesting ones. Hope this help you.
  • Vangipuram Radhakrishna added an answer:
    I need a good way to reduce dimension of my data?
    I have a data set of 500 records and 90 attributes. What are good ways to reduce the dimension without losing much information?
    I don't want to do something like PCA, because the interpretation of the model built on them would be hard. Do you suggest feature selection?

About Data Mining and Knowledge Discovery

It is the research project which is ongoing.

Topic followers (11,329) See all