Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery

  • Jonas Mellin added an answer:
    Do you know of any articles about grouping events into sequences?
    I have a collection of events that occur in a tested system. I keep them in an oracle database. I am looking for algorithms that help me group those events into sequences and then use them to look for possible anomalies.
    Jonas Mellin · University of Skövde
    @Juggapong Natwichai, can the address method handle uncertainty?
  • John K. Marco Pima added an answer:
    Are PhD/Masters Theses and Dissertations or Journals/Conference proceedings the best sources for a Literature Review? Why?
    When reviewing the literature for trends in technological advancement and future directions, we may use either PhD and Masters' theses and dissertations. But which will be useful, or give the most rigorous review?
    John K. Marco Pima · Institute of Accountancy Arusha
    Aatif Saif,
    It is true that Dissertations offer a direction and if you make follow up on the references, then you get the most of it. I have tried this and it has worked for me to supplement other sources.
  • Michael Niemann added an answer:
    How can I extract the adjective?
    The large photo album has extra charges on delivery.
    The adjective large may indicate an attribute size of the photo.
    Red car
    The adjective red may indicate an attribute colour of the photo.

    How can I write a Java program to implement the above job (How do we know the root for large, big is size) and how we know the root for the color?
    Michael Niemann · Monash University (Australia)
    This is some research you will have to do for yourself. Plenty of work has already been done with Wordnet, so there should be plenty of articles and books out there that will explain things for you.
  • How do you determine which objective function to use in the harmony search algorithm for feature selection?
    I am studying the harmony search for feature selection. I read some papers and have some doubts regarding the harmony search.
    How do you determine which objective function to use in the harmony search algorithm for feature selection?
    Félix F. González-Navarro · Autonomous University of Baja California
    i would say cross-validation...Also there is one thing that you should keep in mind. Harmony search is an stochastic algorithm, what this mean? From one run to another, the solution may be different. Picking the best solution could be a biassed solution. So how to solve it?, with several runs of your harmony search experiment. May be you could use bootstrap sampling and with each bootstrap sample apply your harmony search; at the end, some variables or features consistently will appear in the final subsets. Those variables or features are the interesting ones. Hope this help you.
  • Uwizeye Clarisse added an answer:
    How to predict an evolution of a network?
    I'm working on a data mining topic. I have an attributed graph of inflation network of developing countries. My vertex attributes are such as Population, Exchange rate to US $, Price level of consumption(PC), Ratio of GNP to GDP(%),.... The covariation among vertex descriptors can tell me whether the vertex attributes are structurally correlated or not? Is there any possibility to predict the future speed of my graph after a given period? If yes, what do my vertices stand for?
    Uwizeye Clarisse · University Joseph Fourier - Grenoble 1
    Ok If I got you well I think that I will get a kind of correlation patterns sequences.
    And every annual inflation is a function of its own history. So if I want to build a model that predict what comes at t+1 I feel lost for what stand for the time if I proceed that way
  • Vangipuram Radhakrishna added an answer:
    I need a good way to reduce dimension of my data?
    I have a data set of 500 records and 90 attributes. What are good ways to reduce the dimension without losing much information?
    I don't want to do something like PCA, because the interpretation of the model built on them would be hard. Do you suggest feature selection?
  • Madhavi Vaidya added an answer:
    Can I make use of R language for large data analysis?
    If yes, how is it different from MapReduce? What else can be used?
    Madhavi Vaidya · Vivekanand Education Society College of Arts, Science and Commerce
    Dear Alan, surely will check the link given. Thank you so much.
  • Nahid Khorashadizade added an answer:
    How are the most appropriate sub-features chosen?
    I have recently read about article
    "A hybrid decision support system based on rough set and extreme
    learning machine for diagnosis of hepatitis disease" and i am interested in its subject.
    I am working on this subject and having problems to implement Sub-feature sets obtained from RS method corresponding to table 3. I found many reductions. How can I get the listed attributes in table 3?
    How do you select the most appropriate sub-features (20 reduced sub-feature sets)?
    Can you please help me?

    I would appreciate your immediate attention on this matter.
    Nahid Khorashadizade · University of Sistan and Baluchestan
    Dear Mr Mahdavi
    thanks so much for your help.
    good luck
  • Mitali Sonar asked a question:
    What are utility measurements available in utility based privacy preserving methods?
    What are utility measures available in Utility based privacy preserving methods?
  • Ajay Parikh added an answer:
    Where can I get e-governance data for research purpose?
    I want to apply data mining techniques on e-governance data. Where can I get the sample data and related information from?
    Ajay Parikh · Gujarat Vidyapith - Ahmedabad
    What kind of data (govt) you want , election data available on election site like ....
  • Juggapong Natwichai added an answer:
    Does anybody have experience with data mining over big data?
    I would like to work in optimization of algorithms to classification, clustering and logistic regression for data mining over big data.
    Juggapong Natwichai · Chiang Mai University
    I believe that array processing is a crucial part of data mining algorithms. So, check SciQL at

    It can bridge the gap of data storage. In big data, you probably can't store everything in the memory. But storing the data in the file could be really painful in production.
  • Shankar K (d) added an answer:
    Can any one give the list of computer science free journal names/links which are indexed by Scopus/IEEE/ACM/Elsevier?
    This may help us to publish our work on high impact factor journals.
    Shankar K (d) · Alagappa University

    Annexure journals are very H-index and impact factor from multiple journals.
    some journals are free and review process take less then 3 months. my recommendation is find your area of specialization journals(ex: your research area is image processing, International Journal of Computer Vision ). I hope it is very useful to you....
  • Shankar K (d) added an answer:
    Is there any free tool to perform preprocessing of a server log file?
    I am working on web log mining processes. I need a tool which performs pre-processing (data cleaning, user identification, sesion identification) of server log file.
    Shankar K (d) · Alagappa University

    read this article... it is very easy understandable and useful to what you need...
  • Mahmoud Omid added an answer:
    Are these feature selection methods the state of the art?
    There is no widely-recommended technique for feature selection. Commonly used techniques applicable for image processing purposes are PCA, correlation-based, GA and sensitivity analysis. This is debatable. Are these feature selection methods the state of the art? Are there any other feature selection methods more commonly used are left out?
    Mahmoud Omid · University of Tehran
    Dear @Muhammad
    Thank you for answering. The above Link you provided does not work. Could you please send title and authors information. Thanks,
  • Ali Sajedi added an answer:
    Can anyone recommend a course specifically designed for going into the big data business?
    Many researchers come to big data from other areas but are not specifically "tagged" for big data jobs or research. Any angle of treating big aata sciences in a higher educational institute would be appropriate.
    Ali Sajedi · University of Alberta
    Another Coursera MOOC that may be suitable for you:
    Big Data in Education
  • Ved Yadav asked a question:
    Can anyone suggest an AutoMap (CMU) "Scripting" and "Ontology Learning" tutorial?
    I am a beginner user of AutoMap/ORA, I want to know how use scripts for any text file. I'd be grateful if anyone provide me some tutorial links on how to use AutoMap scripts and Ontology Learning. Unable to get enough from its User Guide.
  • Michael Wendl added an answer:
    Which approaches are suitable for preserving the genomic privacy?
    Why anonymization and de-identification models are useless in genomic data privacy?Anonymization techniques are used for preserve the individuals privacy. In genomic data privacy it is difficult to preserve the genomic privacy for anonymization techniques like suppression,generalization,etc...
    Michael Wendl · Washington University in St. Louis
    Agreed. I would point to a paper from some years ago by Russ Altman and colleagues that shows only around a few dozen independent SNPs furnish enough information to identify an individual
  • Sudhakar Singh added an answer:
    Where to find Big Relational or Transactional Data Sets?
    I searching the sources which freely provide the big data sets specially Transactional data sets.
    Sudhakar Singh · Banaras Hindu University
    Thanks Mr. RIOBT, can you please add here the URLs
  • Ioannis T. Christou added an answer:
    What is the suggested method to predict the effect of price change on sales when the price-change is happening for the first time?
    Price-elasticity models do not fit here (since they need previous/historic data with price changes to 'learn' the effect).

    We have sales-data for different items (with no price-changes till now). Business owners are thinking of changing the prices for the first time, and would like to estimate the customer reaction. (Customers are not tracked, anonymous and opportunistic).

    Any pointers on the right approach (R package) ?
    Ioannis T. Christou · Athens Information Technology
    without any further information, there is no way to accurately forecast how the volume of sales will change when the price changes.
    So, there are two things you can do:
    1. if you have substitute products for which you know their elasticity, you can extrapolate the sales volume of your product in question based on the computed elasticity of the other substitute product.
    2. If you don't have data for substitute products (or, you don't think good substitutes exist in your market), then you can perform a small market-research, i.e. ask a number of customers if they would be willing to buy the product in question at a smaller or higher price, and compute demand elasticity from this market research.
    3. Or, you may try limited-time only incremental discounts, and see how the market moves, and based on this information, compute the elasticity of your product.
    4. In general, if your sales seem to be declining as time goes by, it's a good indication that your price is too high for the current market segment you have targeted, and you need to lower prices. On the other hand, discounts, if done carelessly, will only "buy sales", but may very well cause you to actually lose money -maybe due to logistics and/or supply chain management issues that your sales campaign will bring forth, and so on... there are several studies that have documented this effect.

    None of the above is related to R, or MATLAB, or Java, or FORTRAN, as they are simply methods you have to resort to, to compute an approximation of the product's elasticity. Also you need to remember that elasticity is not a static concept, but varies with time, and many other parameters. What may seem as a good bargain price to your customers today, may seem way too much to them tomorrow, so this information needs to be constantly updated.
  • Lars Taxén added an answer:
    Do you think cognitive states and cognitive inputs can be retrieved from information systems?
    Information and knowledge discovery systems are often working on predefined algorithms and constructs on the data sets. Can we extract the historicity and context of cognition behind the data and the relationships.
    Lars Taxén · Linköping University
    To me, cognitive states are entirely confined to our inner workings of our brain. Thus, my answer is no, this is not possible to do.
  • Nazeeh A. Ghatasheh added an answer:
    What tests are required when preparing the data and testing the data using a clinical data set?
    Mostly we could do these tests at an initial stage like pre-process or analysis of data. Also please can you suggest the tests required at various stages with an example or give the web references for understanding the importance of those tests.
    Nazeeh A. Ghatasheh · University of Jordan
    It depends on the type of analysis you intened to perform. But mainly you need to handle missing data (there are many options regarding missing data).

    After (if applicable) you may eliminate noise sources, for example irrelevant records.

    Data normalization or quantification could be required.

    It might be useful to analyze tha data before conducting the tests, some of are correlation analysis, variability, etc.

    Also (depending on the test you intend to conduct and the data) dimensionality reduction may improve the performance.

    This was a quick overview of the pre-processing phase.

    Try to search for data pre-processing approaches or data mining related materials.
  • Mahboobeh Parsapoor added an answer:
    What should be the minimum and maximum dataset size in the area of Datamining research? Which part determines size of dataset?
    Please suggest it for M.Phil., / Ph.D., level work. Also suggest some tutorials for understanding the complete use of data set in research.
  • Osman Ibrahim added an answer:
    What is the free, alternative document collection to TREC?
    I know there are textual document collections for IR but these are not free, such as TREC datasets and others. I need a dataset (Textual documents collection) that meets the requirements of Information Retrieval Research and at the same time, should be large enough and include most of features of micro and macro variations.
    Osman Ibrahim · Minia University
    Thank you but this .csv files. Please, I am wonder if you know source for text documents data (document corpus)
  • Roger Werner added an answer:
    What is a spatial data cube and how to draw a spatial data cube?
    My mentor give a topic related to spatial data cube, and I know little about this. Can anyone give a more specific explain? What technologies does it need? Or recommend some paper for me?
    I'm sorry that I haven't mention about the data I use. It comes from the land change and it's all vector data. Thanks to the answer of Robbi and Ruxandra, I've known that it has a relationship with the Spatio-Temporal Data Mining.
  • Arash Arami added an answer:
    Any literature on semi-supervised feature selection?
    I have found very limited literatures regarding the semi-supervised feature selection. Can anyone share any? Thanks.
    Arash Arami · École Polytechnique Fédérale de Lausanne
    Nonlinear System Identification by Nelles is a nice book, however does not cover the semi-supervised based feature selection. if your problem is dynamic feature selection you should look at attention learning literature, I can even suggest one of my papers where we provided a framework for semi-supervised learning of feature and/or goal selection:
    "Attention to multiple local critics in decision making and control"
  • Walter Kuhn added an answer:
    Which techniques are used for intention modelling?
    I want to know about the techniques used for detecting intention from textual dataset.
    Walter Kuhn · Hochschule für Wirtschaft Zürich
    I am sure, these sources can help you:

    maybe also have a look for Sentiment Analysis
  • Daniel Pop added an answer:
    How can I start the implementation of Lazy approach to associative classification?
    Daniel Pop · West University of Timisoara
    LAC is detailed in the paper below:

    One can implement the algorithm in any general purpose programming language (C++, Java, Python, etc) or one can use environments specifically created to support machine learning / statistics algorithms, such as: R, Octave/Matlab, Mathematica etc. Using later approach it is easier to implement the algorithm since one benefit of powerful functions and libraries already available.
  • Mahboobeh Parsapoor added an answer:
    Is anyone familiar with research issues using neural network to diagnose the stage of cervical cancer?
    Using NN to diagnose the stage of cervical cancer based on the patient data (treatment)

About Data Mining and Knowledge Discovery

It is the research project which is ongoing.

Topic Followers (11197) See all