Conference Paper

Intelligent Systems in Modeling Phase of Information Mining Development Process

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

The Information Mining Engineering (IME) understands in processes, methodologies, tasks and techniques used to: organize, control and manage the task of finding knowledge patterns in information bases. A relevant task is selecting the data mining algorithms to use, which it is left to the expertise of the information mining engineer, developing it in a non-structured way. In this paper we propose an Information Mining Project Development Process Model (D-MoProPEI) which provides an integrated view in the selection of Information Mining Processes Based on Intelligent Systems (IMPbIS) within the Modeling Phase of the proposed Process Model through a Systematic Deriving Methodology.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For this reason, it is necessary to develop a solution that serves to discover correct and complete spatial co-location patterns around reference features. In this work, we develop a Knowledge Discovery Process [15,16] to give a solution to this problem using an Event-Centric Model to generate transaction-based data, and induction of decision trees to generate co-location rules. ...
... SMC seems to be a proper solution to solve the aforementioned problems, but finding a way of discovering co-location patterns around features relevant to the problem domain is necessary, because the classical association rules discovery algorithms used in the transactional-based approaches cannot be used to select a target feature. For this reason, a knowledge discovery process for co-location pattern discovery is proposed, focused on reference features, that uses an event-centric model for transaction-based data generation through spatial maximal cliques and using a Process of Discovery of Behavior Rules using Decision Tree Learning algorithms [15,16]. Figure 1 shows the proposed process using BPMN [17]. ...
Chapter
Full-text available
The co-location discovery process serves to find subsets of spatial features frequently located together. Many algorithms and methods have been designed in recent years; however, finding this kind of patterns around specific spatial features is a task in which the existing solutions provide incorrect results. Throughout this paper we propose a knowledge discovery process to find co-location patterns focused on reference features using decision tree learning algorithms on transactional data generated using maximal cliques. A validation test of this process is provided.
... Also, it is possible to generate a heuristic approach for data mining algorithm selection using problem domain metadata. Lastly, the use of this process into an iterative methodology such as CRISP-DM [30] or MoProPEI [31] must be assessed. ...
Chapter
Full-text available
Detection of spatial outliers is a spatial data mining task aimed at discovering data observations that differ from other data observations within its spatial neighborhood. Some considerations that depend on the problem domain and data characteristics have to be taken into account for the selection of the data mining algorithms to be used in each data mining project. This massive amount of possible algorithm combinations makes it necessary to design a knowledge discovery process for detection of local spatial outliers in order to perform this activity in a standardized way. This work provides a proposal for this knowledge discovery process based on the Knowledge Discovery in Database process (KDD) and a proof of concept of this design using real world data.
... For this reason, a knowledge discovery process, defined as a group of logically related tasks that form a set of information with a degree of value for the organization obtains knowledge pieces that generalize the previous information [1,17,18], is designed to generate decision rules on clustering results regardless of the selected approach to generate spatial clusters. ...
Conference Paper
Full-text available
Spatial clustering is an important field of spatial data mining and knowledge discovery that serves to partition a spatial data set to obtain disjoint subsets with spatial elements that are similar to each other. Existing algorithms can be used to perform three types of cluster analyses, including clustering of spatial points, regionalization and point pattern analysis. However, all these existing methods do not provide a description of the discovered spatial clusters, which is useful for decision making in many different fields. This work proposes a knowledge discovery process for the description of spatially referenced clusters that uses decision tree learning algorithms. Two proofs of concept of the proposed process using different spatial clustering algorithm on real data are also provided.
Conference Paper
Information Mining Projects provide synthesis and analysis tools which allow the available data of an organization to be transformed into useful knowledge for the decision-making process. It is for this reason that requirements of this type of project are different from requirements of traditional projects for software development. Consequently, processes associated with requirements engineering for this type of project cannot be reused in Information Mining projects. Likewise, available methodologies for these last projects leave aside activities associated with the Requirements Management of stakeholders and customers. In this context, a model giving solution to the necessities of managing project Information Mining requirements is offered.
Article
Full-text available
Los modelos de procesos para proyectos de explotación de información (en ocasiones mencionados como minería de datos) existentes, carecen de la visión de gestión necesaria para llevar a cabo este tipo proyectos de manera exitosa ó presentan una estructura la cual no se adapta completamente a las necesidades de este tipo de proyectos. En este trabajo se propone un modelo de procesos para proyectos de explotación de información que guie el desarrollo del proyecto, considerando tanto los aspectos de gestión, así como los técnicos, con el fin de generar piezas de conocimiento las cuales sirvan como soporte para la toma de decisiones. Complementariamente, se reestructuran las actividades técnicas favoreciendo el progreso fluido del proyecto.
Conference Paper
Full-text available
There are information mining methodologies that emphasize the importance of planning for requirements elicitation along the entire project in an orderly, documented, consistent and traceable manner. However, given the characteristics of this type of project, the approach proposed by the classical requirements engineering is not applicable to the process of identifying the problem of information mining, nor allows to infer from the business domain modelling, the information mining process which solves it. This paper proposes an extension of semantic nets and frames to represent knowledge of the business domain, business problem and problem of information mining; and a methodology to derive the information mining process from the proposed knowledge representations is introduced.
Article
Full-text available
In this paper we try to induce rules that describe patterns in human faces. We apply two different data-mining algorithms, C4.5 and C5.0, in a database of faces parameters in the MPEG4 FDP (Face Definition Parameters) form. Also we modify the database in two different ways before applying the algorithms: variable discretization of some fields; and selection of the main clusters with Self-Organizing Maps.
Conference Paper
Full-text available
Business Intelligence offers an interdisciplinary approach (within which is Information Systems), that taking all available information resources and using of analytical and synthesis tools with the ability to transform information into knowledge, focuses on generating knowledge that contributes to the management decision-making and generation of strategic plans in organizations. Information Mining is the sub-discipline of information systems which supports business intelligence tools to transform information into knowledge. It has defined as the search for interesting patterns and important regularities in large bodies of information. We address the need to identify information mining processes to obtain knowledge from available information. When information mining processes are defined, we may decide which data mining algorithms will support the information mining processes. In this context, this paper proposes a characterization of the information mining process related to the following business intelligence problems: discovery of rules of behavior, discovery of groups, discovery of significant attributes, discovering rules of group membership and weight of rules of behavior or rules of group memberships.
Chapter
Full-text available
The Information Mining Projects have different characteristics compared to the tradition software development projects. The classic software development phases do not apply to the natural phases of this type of projects. As a result, Software Engineering tools such as requirement elicitation techniques, software development process model, estimation methods and activities map do not apply to this type of projects. A new body of knowledge is necessary to be developed for Information Mining Engineering with a special focus on its use in industry. In this paper we propose: a process model, a requirement elicitation process, an estimation method and, a set of processes for information mining based on the application of different data mining techniques.
Article
Full-text available
We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption oflikelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user''s priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—aprior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at mostk=1 parent. For the general case (k>1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.
Chapter
Full-text available
Business intelligence (BI) is a data-driven DSS that combines data gathering, data storage, and knowledge management with analysis to provide input to the decision process. The term originated in 1989; prior to that many of its characteristics were part of executive information systems. Business intelligence emphasizes analysis of large volumes of data about the firm and its operations. It includes competitive intelligence (monitoring competitors) as a subset. In computer-based environments, business intelligence uses a large database, typically stored in a data warehouse or data mart, as its source of information and as the basis for sophisticated analysis. Analyses ranges from simple reporting to slice-and-dice, drill down, answering ad hoc queries, real-time analysis, and forecasting. A large number of vendors provide analysis tools. Perhaps the most useful of these is the dashboard. Recent developments in BI include business performance measurement (BPM), business activity monitoring (BAM), and the expansion of BI from being a staff tool to being used by people throughout the organization (BI for the masses). In the long-term, BI techniques and findings will be imbedded into business processes.
Conference Paper
Full-text available
We describe algorithms for learning Bayesian networks from a combination of user knowledge and statistical data. The algorithms have two components: a scoring metric and a search procedure. The scoring metric takes a network structure, statistical data, and a user's prior knowledge, and returns a score proportional to the posterior probability of the network structure given the data. The search procedure generates networks for evaluation by the scoring metric. Our contributions are threefold. First, we identify two important properties of metrics, which we call event equivalence and parameter modularity. These properties have been mostly ignored, but when combined, greatly simplify the encoding of a user's prior knowledge. In particular, a user can express her knowledge-for the most part-as a single prior Bayesian network for the domain. Second, we describe local search and annealing algorithms to be used in conjunction with scoring metrics. In the special case where each node has at most one parent, we show that heuristic search can be replaced with a polynomial algorithm to identify the networks with the highest score. Third, we describe a methodology for evaluating Bayesian-network learning algorithms. We apply this approach to a comparison of metrics and search procedures.
Conference Paper
Full-text available
Both the number and complexity of Data Mining projects has increased in late years. Unfortunately, nowadays there isn't a formal process model for this kind of projects, or existing approaches are not right or complete enough. In some sense, present situation is comparable to that in software that led to 'software crisis' in latest 60's. Software Engineering matured based on process models and methodologies. Data Mining's evolution is being parallel to that in Software Engineering. The research work described in this paper proposes a ProcessModel for Data Mining Projects based on the study of current Software Engineering Process Models (IEEE Std 1074 and ISO 12207) and the most used Data Mining Methodology CRISP-DM (considered as a "facto" standard) as basic references.
Conference Paper
Full-text available
The preliminary results presented in this paper corresponds to a research project oriented to the search of the relationship between the predilection of students concerning learning style and the pedagogical protocols used by the human tutors (professors during the first courses of the Computer Engineering undergraduate Program) by using intelligent systems tools.
Conference Paper
Full-text available
Conference Paper
Full-text available
The purpose of the present article is to investigate if there exist any such set of temporal stable patterns in temporal series of meteorological variables studying series of air temperature, wind speed and direction an atmospheric pressure in a period with meteorological conditions involving nocturnal inversion of air temperature in Allen, Rio Negro, Argentina. Our conjecture is that there exist independent stable temporal activities, the mixture of which give rise to the weather variables; and these stable activities could be extracted by Self Organized Maps plus Top Down Induction Decision Trees analysis of the data arising from the weather patterns, viewing them as temporal signals. Full Text at Springer, may require registration or fee
Book
Full-text available
This is a book, not a book review.
Article
Full-text available
An overview of cluster analysis techniques from a data mining point of view is given. This is done by a strict separation of the questions of various similarity and distance measures and related optimization criteria for clusterings from the methods to create and modify clusterings themselves. In addition to this general setting and overview, the second focus is used on discussions of the essential ingredients of the demographic cluster algorithm of IBM's Intelligent Miner, based Condorcet's criterion.
Conference Paper
Full-text available
The points being approached in this paper are: the problem of detecting unusual changes of consumption in mobile phone users, the corresponding building of data structures which represent the recent and historic users’ behavior bearing in mind the information included in a call, and the complexity of the construction of a function with so many variables where the parameterization is not always known. Full Text at Springer, may require registration or fee
Conference Paper
Full-text available
Obtaining a bayesian network from data is a learning process that is divided in two steps: structural learning and parametric learning. In this paper, we define an automatic learning method that optimizes the bayesian networks applied to classification, using a hybrid method of learning that combines the advantages of the induction techniques of the decision trees with those of the bayesian networks. Full Text at Springer, may require registration or fee
Article
From the Publisher:Master the new computational tools to get the most out of your information system.This practical guide, the first to clearly outline the situation for the benefit of engineers and scientists, provides a straightforward introduction to basic machine learning and data mining methods, covering the analysis of numerical, text, and sound data.
Article
This paper describesfoil, a system that learns Horn clauses from data expressed as relations.foil is based on ideas that have proved effective in attribute-value learning systems, but extends them to a first-order formalism. This new system has been applied successfully to several tasks taken from the machine learning literature.
Conference Paper
We present research work in progress that focuses on data mining tools used for helping teachers to apply a three step knowledge discovering process to diagnose studentspsila misunderstandings (and their causes) related to their programming errors.
Article
The presented theory views inductive learning as a heuristic search through a space of symbolic descriptions, generated by an application of various inference rules to the initial observational statements. The inference rules include generalization rules, which perform generalizing transformations on descriptions, and conventional truth-preserving deductive rules. The application of the inference rules to descriptions is constrained by problem background knowledge, and guided by criteria evaluating the “quality” of generated inductive assertions. Based on this theory, a general methodology for learning structural descriptions from examples, called Star, is described and illustrated by a problem from the area of conceptual data analysis.
Article
Business Process Management Systems (BPMSs) are software platforms that support the definition, execution, and tracking of business processes. BPMSs have the ability of logging information about the business processes they support. Proper analysis of BPMS execution logs can yield important knowledge and help organizations improve the quality of their business processes and services to their business partners. This paper presents a set of integrated tools that supports business and IT users in managing process execution quality by providing several features, such as analysis, prediction, monitoring, control, and optimization. We refer to this set of tools as the Business Process Intelligence (BPI) tool suite. Experimental results presented in this paper are very encouraging. We plan to investigate further enhancements on the BPI tools suite, including automated exception prevention, and refinement of process data preparation stage, as well as integrating other data mining techniques.
Article
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field. Copyright © 1996, American Association for Artificial Intelligence. All rights reserved.
Article
The purpose of this article is to present an experimental application for the detection of possible breast lesions by means of neural networks in medical digital imaging. This application broadens the scope of research into the creation of different types of topologies with the aim of improving existing networks and creating new architectures which allow for improved detection. Full Text at Springer, may require registration or fee
Process Mining Proposal for Information Mining Engineering: MoProPEI (in spanish)
  • S Martins
  • P Pesado
  • R García-Martínez
Martins, S., Pesado, P., García-Martínez, R. (2014). Process Mining Proposal for Information Mining Engineering: MoProPEI (in spanish). Latin-American Journal of Software Engineering, 2(5): 313-332. http://dx.doi.org/10.18294/relais.2014.313-332. ISSN: 2314-2642
Information Mining Processes Based on Intelligent Systems
  • R García-Martínez
  • P Britos
  • D Rodríguez
  • M Ali
  • T Bosse
  • K V Hindriks
  • M Hoogendoorn
  • C M Jonker
García-Martínez, R., Britos, P., Rodríguez, D.: Information Mining Processes Based on Intelligent Systems. In: Ali, M., Bosse, T., Hindriks, K.V., Hoogendoorn, M., Jonker, C.M., Treur, Jan (eds.) IEA/AIE 2013. LNCS, vol. 7906, pp. 402-410. Springer, Heidelberg (2013)
  • R Michalski
  • I Bratko
  • M Kubat
Michalski, R., Bratko, I., Kubat, M.: Machine Learning and Data Mining, Methods and Applications. Wiley, New York (1998)
CRISP-DM 1.0 Step by step BI guide
  • P Chapman
  • J Clinton
  • R Keber
  • T Khabaza
  • T Reinartz
  • C Shearer
  • R Wirth
Chapman, P., Clinton, J., Keber, R., Khabaza, T., Reinartz, T., Shearer, C., Wirth, R.: CRISP-DM 1.0 Step by step BI guide. Edited by SPSS (2000)
An engineering approach to data mining projects
  • Ó Marbán
  • G Mariscal
  • E Menasalvas
  • J Segovia
  • H Yin
  • P Tino
  • E Corchado
  • W Byrne
Marbán, Ó., Mariscal, G., Menasalvas, E., Segovia, J.: An engineering approach to data mining projects. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 578-588. Springer, Heidelberg (2007)
Pedagogical protocols selection automatic assistance
  • P Britos
  • Z Cataldi
  • E Sierra
  • R García-Martínez
Britos, P., Cataldi, Z., Sierra, E., García-Martínez, R.: Pedagogical protocols selection automatic assistance. In: Nguyen, N.T., Borzemski, L., Grzech, A., Ali, M. (eds.) IEA/AIE 2008. LNCS (LNAI), vol. 5027, pp. 331-336. Springer, Heidelberg (2008)
Patterns in temporal series of meteorological variables using SOM & TDIDT
  • M Cogliati
  • P Britos
  • R García-Martínez
Cogliati, M., Britos, P., García-Martínez, R.: Patterns in temporal series of meteorological variables using SOM & TDIDT. In: Bramer, M. (ed.) AITP. IFIP, vol. 217, pp. 305-314. Springer, Boston (2006)