Data Mining and Knowledge Discovery

Data Mining and Knowledge Discovery

  • Jonas Mellin added an answer:
    Do you know of any articles about grouping events into sequences?
    I have a collection of events that occur in a tested system. I keep them in an oracle database. I am looking for algorithms that help me group those events into sequences and then use them to look for possible anomalies.
    Jonas Mellin · University of Skövde
    @Juggapong Natwichai, can the address method handle uncertainty?
  • John K. Marco Pima added an answer:
    Are PhD/Masters Theses and Dissertations or Journals/Conference proceedings the best sources for a Literature Review? Why?
    When reviewing the literature for trends in technological advancement and future directions, we may use either PhD and Masters' theses and dissertations. But which will be useful, or give the most rigorous review?
    John K. Marco Pima · Institute of Accountancy Arusha
    Aatif Saif,
    It is true that Dissertations offer a direction and if you make follow up on the references, then you get the most of it. I have tried this and it has worked for me to supplement other sources.
  • Michael Niemann added an answer:
    How can I extract the adjective?
    The large photo album has extra charges on delivery.
    The adjective large may indicate an attribute size of the photo.
    Red car
    The adjective red may indicate an attribute colour of the photo.

    How can I write a Java program to implement the above job (How do we know the root for large, big is size) and how we know the root for the color?
    Michael Niemann · Monash University (Australia)
    This is some research you will have to do for yourself. Plenty of work has already been done with Wordnet, so there should be plenty of articles and books out there that will explain things for you.
  • How do you determine which objective function to use in the harmony search algorithm for feature selection?
    I am studying the harmony search for feature selection. I read some papers and have some doubts regarding the harmony search.
    How do you determine which objective function to use in the harmony search algorithm for feature selection?
    Félix F. González-Navarro · Autonomous University of Baja California
    i would say cross-validation...Also there is one thing that you should keep in mind. Harmony search is an stochastic algorithm, what this mean? From one run to another, the solution may be different. Picking the best solution could be a biassed solution. So how to solve it?, with several runs of your harmony search experiment. May be you could use bootstrap sampling and with each bootstrap sample apply your harmony search; at the end, some variables or features consistently will appear in the final subsets. Those variables or features are the interesting ones. Hope this help you.
  • Uwizeye Clarisse added an answer:
    How to predict an evolution of a network?
    I'm working on a data mining topic. I have an attributed graph of inflation network of developing countries. My vertex attributes are such as Population, Exchange rate to US $, Price level of consumption(PC), Ratio of GNP to GDP(%),.... The covariation among vertex descriptors can tell me whether the vertex attributes are structurally correlated or not? Is there any possibility to predict the future speed of my graph after a given period? If yes, what do my vertices stand for?
    Uwizeye Clarisse · University Joseph Fourier - Grenoble 1
    Ok If I got you well I think that I will get a kind of correlation patterns sequences.
    And every annual inflation is a function of its own history. So if I want to build a model that predict what comes at t+1 I feel lost for what stand for the time if I proceed that way
  • Vangipuram Radhakrishna added an answer:
    I need a good way to reduce dimension of my data?
    I have a data set of 500 records and 90 attributes. What are good ways to reduce the dimension without losing much information?
    I don't want to do something like PCA, because the interpretation of the model built on them would be hard. Do you suggest feature selection?
  • Viorel Chendeş added an answer:
    Can anyone help to determine the time lag between flood gauges from upstream to downstream?
    Can anyone suggest how we should calculate time lags of historical flow data between upstream and downstream gauges for urban flood prediction (considering that for various flood events the time lags between upstream and downstream flow stations can change depends on precipitation characteristics and and hydrologic condition of catchments)?
    Viorel Chendeş · National Institute of Hydrology and Water Management
    For time lag, usually I use the relationship from the SCS model (Soil Conservation Service), with following variables : length of the watercourse, the average slope of the basin and the CN (Curve Number) parameter. It can be computed in GRID format.
    If you want to calculate Tlag for every point along a river, the only problem seems to be with the length of the river.
  • Madhavi Vaidya added an answer:
    Can I make use of R language for large data analysis?
    If yes, how is it different from MapReduce? What else can be used?
    Madhavi Vaidya · Vivekanand Education Society College of Arts, Science and Commerce
    Dear Alan, surely will check the link given. Thank you so much.
  • Nahid Khorashadizade added an answer:
    How are the most appropriate sub-features chosen?
    I have recently read about article
    "A hybrid decision support system based on rough set and extreme
    learning machine for diagnosis of hepatitis disease" and i am interested in its subject.
    I am working on this subject and having problems to implement Sub-feature sets obtained from RS method corresponding to table 3. I found many reductions. How can I get the listed attributes in table 3?
    How do you select the most appropriate sub-features (20 reduced sub-feature sets)?
    Can you please help me?

    I would appreciate your immediate attention on this matter.
    Nahid Khorashadizade · University of Sistan and Baluchestan
    Dear Mr Mahdavi
    thanks so much for your help.
    good luck
  • Mitali Sonar asked a question:
    What are utility measurements available in utility based privacy preserving methods?
    What are utility measures available in Utility based privacy preserving methods?
  • Is the Technology Acceptance Model a thing of the Past?
    The Technology Acceptance Model (TAM) is an information systems theory that models how users come to accept and use a technology.
    Ra'ed (Moh'd Taisir) Masa'deh · University of Jordan
    Please see our article as we considered a modification of TAM:
    A Structural Equation Modeling Approach for Determining Antecedents and Outcomes of Students’ Attitude toward Mobile Commerce Adoption

    Ra'ed (Moh’d Taisir) Masa'deh, Rifat O. Shannak, Mahmoud Mohammad Maqableh

    This might help
  • Suganthi Dinesh asked a question:
    Are there any the websites for datasets in stock market field?
    I need the huge amount of dataset in the stock market field.
  • Karimulla Naik added an answer:
    Can anyone suggest reading on research in knowledge learning system?
    I wanted to know what researchers in theoretical computer science, machine learning, math and AI think of a software system which learns by itself.

    Can such a system be called a self-aware system? Meaning a system which is aware of itself. And its 'self' is knowledge.

    The application is something like knowledge graph in google. It can be better than that. People can query the system to tell something about itself, which will be knowledge of our world.

    If anybody have pointers regarding this topic, please guide me on this.
    Tomas, Thanks a ton. That article helped me a lot. :)

    Thanks everyone. I got an initial idea of how to proceed further.
  • Ajay Parikh added an answer:
    Where can I get e-governance data for research purpose?
    I want to apply data mining techniques on e-governance data. Where can I get the sample data and related information from?
    Ajay Parikh · Gujarat Vidyapith - Ahmedabad
    What kind of data (govt) you want , election data available on election site like ....
  • Purusothaman Gnanapandithan asked a question:
    What is the diffrence between cited or referred in research? Is necessary our base papers either cited or referred by researchers?
    Because we are choosing base papers from IEEE/ACM/Elsevier/other. But Some papers are not cited / referred by researcher.
  • Juggapong Natwichai added an answer:
    Does anybody have experience with data mining over big data?
    I would like to work in optimization of algorithms to classification, clustering and logistic regression for data mining over big data.
    Juggapong Natwichai · Chiang Mai University
    I believe that array processing is a crucial part of data mining algorithms. So, check SciQL at

    It can bridge the gap of data storage. In big data, you probably can't store everything in the memory. But storing the data in the file could be really painful in production.
  • Shankar K (d) added an answer:
    Can any one give the list of computer science free journal names/links which are indexed by Scopus/IEEE/ACM/Elsevier?
    This may help us to publish our work on high impact factor journals.
    Shankar K (d) · Alagappa University

    Annexure journals are very H-index and impact factor from multiple journals.
    some journals are free and review process take less then 3 months. my recommendation is find your area of specialization journals(ex: your research area is image processing, International Journal of Computer Vision ). I hope it is very useful to you....
  • Shankar K (d) added an answer:
    Is there any free tool to perform preprocessing of a server log file?
    I am working on web log mining processes. I need a tool which performs pre-processing (data cleaning, user identification, sesion identification) of server log file.
    Shankar K (d) · Alagappa University

    read this article... it is very easy understandable and useful to what you need...
  • Dharmesh J Bhalodiya added an answer:
    Where to get Frequent itemset mining datasets?
    Is there any website to download datasets for frequent itemsets
    Dharmesh J Bhalodiya · Silver Oak College of Engineering & Technology
    KDD CUP datasets can be found here
  • Mahmoud Omid added an answer:
    Are these feature selection methods the state of the art?
    There is no widely-recommended technique for feature selection. Commonly used techniques applicable for image processing purposes are PCA, correlation-based, GA and sensitivity analysis. This is debatable. Are these feature selection methods the state of the art? Are there any other feature selection methods more commonly used are left out?
    Mahmoud Omid · University of Tehran
    Dear @Muhammad
    Thank you for answering. The above Link you provided does not work. Could you please send title and authors information. Thanks,
  • Miriam Laker-Oketta added an answer:
    Is there a way to calculate correlation between categorical and continuous variables?
    I am trying to calculate correlation between a variable, X, and a variable, Y, where X is numerical and Y is categorical.
    Miriam Laker-Oketta · University of California, San Francisco
    Can you read about the point biserial command? It can help to check for correlation between a binary and continuous variable
  • Ali Sajedi added an answer:
    Can anyone recommend a course specifically designed for going into the big data business?
    Many researchers come to big data from other areas but are not specifically "tagged" for big data jobs or research. Any angle of treating big aata sciences in a higher educational institute would be appropriate.
    Ali Sajedi · University of Alberta
    Another Coursera MOOC that may be suitable for you:
    Big Data in Education
  • Ved Yadav asked a question:
    Can anyone suggest an AutoMap (CMU) "Scripting" and "Ontology Learning" tutorial?
    I am a beginner user of AutoMap/ORA, I want to know how use scripts for any text file. I'd be grateful if anyone provide me some tutorial links on how to use AutoMap scripts and Ontology Learning. Unable to get enough from its User Guide.
  • Michael Wendl added an answer:
    Which approaches are suitable for preserving the genomic privacy?
    Why anonymization and de-identification models are useless in genomic data privacy?Anonymization techniques are used for preserve the individuals privacy. In genomic data privacy it is difficult to preserve the genomic privacy for anonymization techniques like suppression,generalization,etc...
    Michael Wendl · Washington University in St. Louis
    Agreed. I would point to a paper from some years ago by Russ Altman and colleagues that shows only around a few dozen independent SNPs furnish enough information to identify an individual
  • Sudhakar Singh added an answer:
    Where to find Big Relational or Transactional Data Sets?
    I searching the sources which freely provide the big data sets specially Transactional data sets.
    Sudhakar Singh · Banaras Hindu University
    Thanks Mr. RIOBT, can you please add here the URLs
  • Ioannis T. Christou added an answer:
    What is the suggested method to predict the effect of price change on sales when the price-change is happening for the first time?
    Price-elasticity models do not fit here (since they need previous/historic data with price changes to 'learn' the effect).

    We have sales-data for different items (with no price-changes till now). Business owners are thinking of changing the prices for the first time, and would like to estimate the customer reaction. (Customers are not tracked, anonymous and opportunistic).

    Any pointers on the right approach (R package) ?
    Ioannis T. Christou · Athens Information Technology
    without any further information, there is no way to accurately forecast how the volume of sales will change when the price changes.
    So, there are two things you can do:
    1. if you have substitute products for which you know their elasticity, you can extrapolate the sales volume of your product in question based on the computed elasticity of the other substitute product.
    2. If you don't have data for substitute products (or, you don't think good substitutes exist in your market), then you can perform a small market-research, i.e. ask a number of customers if they would be willing to buy the product in question at a smaller or higher price, and compute demand elasticity from this market research.
    3. Or, you may try limited-time only incremental discounts, and see how the market moves, and based on this information, compute the elasticity of your product.
    4. In general, if your sales seem to be declining as time goes by, it's a good indication that your price is too high for the current market segment you have targeted, and you need to lower prices. On the other hand, discounts, if done carelessly, will only "buy sales", but may very well cause you to actually lose money -maybe due to logistics and/or supply chain management issues that your sales campaign will bring forth, and so on... there are several studies that have documented this effect.

    None of the above is related to R, or MATLAB, or Java, or FORTRAN, as they are simply methods you have to resort to, to compute an approximation of the product's elasticity. Also you need to remember that elasticity is not a static concept, but varies with time, and many other parameters. What may seem as a good bargain price to your customers today, may seem way too much to them tomorrow, so this information needs to be constantly updated.
  • Lars Taxén added an answer:
    Do you think cognitive states and cognitive inputs can be retrieved from information systems?
    Information and knowledge discovery systems are often working on predefined algorithms and constructs on the data sets. Can we extract the historicity and context of cognition behind the data and the relationships.
    Lars Taxén · Linköping University
    To me, cognitive states are entirely confined to our inner workings of our brain. Thus, my answer is no, this is not possible to do.
  • Nazeeh A. Ghatasheh added an answer:
    What tests are required when preparing the data and testing the data using a clinical data set?
    Mostly we could do these tests at an initial stage like pre-process or analysis of data. Also please can you suggest the tests required at various stages with an example or give the web references for understanding the importance of those tests.
    Nazeeh A. Ghatasheh · University of Jordan
    It depends on the type of analysis you intened to perform. But mainly you need to handle missing data (there are many options regarding missing data).

    After (if applicable) you may eliminate noise sources, for example irrelevant records.

    Data normalization or quantification could be required.

    It might be useful to analyze tha data before conducting the tests, some of are correlation analysis, variability, etc.

    Also (depending on the test you intend to conduct and the data) dimensionality reduction may improve the performance.

    This was a quick overview of the pre-processing phase.

    Try to search for data pre-processing approaches or data mining related materials.
  • Mahboobeh Parsapoor added an answer:
    What should be the minimum and maximum dataset size in the area of Datamining research? Which part determines size of dataset?
    Please suggest it for M.Phil., / Ph.D., level work. Also suggest some tutorials for understanding the complete use of data set in research.

About Data Mining and Knowledge Discovery

It is the research project which is ongoing.

Topic Followers (11190) See all