Content uploaded by Ahmet Selman Bozkir
Author content
All content in this area was uploaded by Ahmet Selman Bozkir on Feb 18, 2015
Content may be subject to copyright.
A THREE-TIERED WEB BASED EXPLORATION AND
REPORTING TOOL FOR DATA MINING
Ahmet Selman BOZKIR
Hacettepe University Computer Engineering Department, Ankara, Turkey
selman@cs.hacettepe.edu.tr
Ebru Akcapinar SEZER
Hacettepe University Computer Engineering Department, Ankara, Turkey
ebru@hacettepe.edu.tr
ABSTRACT
In recent years, many companies have begun to use data mining and decision support systems (DSS) for decision making
activities. Although their use is increasing continuously, DSSs are generally built as “desktop applications” and designed
for the use of data mining experts. The purposes of the present study are selected as design and implementation of a web-
based data mining exploration and reporting tool namely ASMiner. ASMiner provides exploring and reporting on three
data mining techniques (decision trees, clustering and association rules mining), by presenting a scalable and fully web-
based thin client data mining tool for both decision makers and knowledge workers.
KEYWORDS
DSS, Decision-making, Web-based data mining
1. INTRODUCTION
The data mining is a useful tool for decision-makers in predicting and planning the future. It is possible to say
that the data mining methods may have a crucial importance among the existing approaches to solve
forecasting problems encountered in all engineering areas, medical and applied sciences, etc. in near future.
Web based technologies have been revolutionized the design, development and implementation stages of
decision support systems (Ba & Kalakota, 1995; Bhargava & Power, 2001). Moreover, the Web
environment is expanding as a very important DSS development and delivery platform (Shim et al., 2002).
The key advantages of the web based tools when compared with the traditional batch-based or client-server
oriented tools include ease of-use, universal access across information technology platforms, and single
minute response and feedback based upon dynamic and real-time data (Heinrichs & Him, 2003).
Development of a completely web based data mining exploration and reporting tool to save time during
the exploration and reporting phases of data mining applications and to enable even typical users to be
effective decision makers are the main purposes of the present study. For the purpose of the study, a tool
namely ASMiner is developed. ASMiner employs Microsoft SQL Server Analysis Services behind the scene
as the data mining engine and it currently supports three data mining techniques such as decision trees,
clustering and association rules.
2. WEB BASED DATA MINING TOOL DEVELOPED
In market, it is possible to find numerous numbers of data mining tools and applications requiring
professional data mining background and practice. Owing to these requirements, the data mining solutions as
the software packages are used by data mining experts, only. Moreover, most of all commercial data mining
solutions are implemented with non web-based approaches. Furthermore, report production in many data
mining software still requires exhaustive and time-consuming processes. To cope with these difficulties,
ASMiner considers the knowledge workers to help them in the process of becoming data miners and to
achieve this, it presents easy to understand, user friendly and perspicuous user interfaces in exploring mining
models created in Microsoft Analysis Services.
In market, some databases such as Oracle, MS SQL and WEKA etc exist. However, when developing
ASMiner, Microsoft Analysis Services is preferred owing to its cheapness and commonly usefulness.
Microsoft Analysis Services has been the business intelligence component of Microsoft SQL Server software
since 2000. In decision tree algorithm platform, Microsoft invented it is own decision tree algorithm namely
“Microsoft Decision Trees”. This algorithm can handle both categorical and continuous variables as well as
CART and CHAID. In addition, it supports entropy and Bayesian score as the splitting strategy and unlike
the other famous algorithms, it offers no pruning phase. In Analysis Services, as soon as a decision tree
model is created, a corresponding dependency network is also formed. In clustering models, Microsoft
Analysis Services offers two types of clustering algorithms such as K-Means and EM (Expectation-
Maximization) with scalable and non-scalable versions. On the other hand, well known Apriori algorithm is
employed in association rules mining.
ASMiner uses client connectivity interfaces of SQL Server in both OLTP and data mining aspect.
ADOMD.NET and AMO has been used as the entry point to Analysis Services. ADOMD.NET is mainly
focused on retrieving mining models’ meta-data. However, AMO provides management options on server
objects in Analysis Services. Thus, model training/processing operations and model settings can only be
made via AMO. Domain experts can load, create and manage data mining models on Analysis Services by
using a reduced version of Visual Studio that shipped with Microsoft SQL Server. As soon as a domain
expert creates a data mining model in Analysis Services, model is saved with its metadata and this metadata
can be retrieved by ADOMD.NET. Cooperating with AMO and ADOMD.NET, ASMiner accesses data
mining models’ metadata and composes appropriate viewers that users request.
Figure 1. Modules and sub-components chart of ASMiner
ASMiner is formed by five main modules such as authentication mechanism, decision tree subsystem,
clustering subsystem; association rules subsystem and management tools (Fig. 1). Authentication subsystem
authorizes every request and validate if the user has access right to requested page and operation. Decision
tree, clustering and association rules mining subsystems have their specific type of mining model viewers. In
these viewers, some third party open source charting and visualization components are either used or self-
developed in this study. ASMiner also has a management tool developed for various purposes. These
characteristics of ASMiner are explained in the subsequent paragraphs. Decision tree module of ASMiner
contains three types of tree viewer such as general tree viewer, discrete tree viewer and radial tree viewer.
General tree viewer has a capacity to draw both regression trees and discrete decision trees. To increase
speed and interactivity, Javascript client side scripting technology is utilized when drawing a tree. In tree
design, Walker tree drawing algorithm is employed for production of perspicuous and aesthetic trees. Users
can navigate on trees by expanding or closing the nodes by clicking appropriate buttons on nodes. Besides,
Visifire (Visifire, 2008) charting solutions are employed in the node histogram display. One of the other
important features of general tree viewer (Fig. 2) is to have a drill-through support. Finally, drill-through data
can be stored as CSV or Excel formats.
Figure 2. General tree viewer of ASMiner
Discrete tree viewer has some special properties specified for discrete decision trees. Additionally, a
radial tree viewer is empirically implemented to provide an opportunity of viewing tree structure in a
different point of view for users. The dependency network graphs are produced for the correlation
exploration. ASMiner has two types of dependency network viewers. By using these graphs, users can
navigate on the overall graph and explore the content. A sample dependency network graph displayed with
ZGRViewer is presented in Fig. 3. Another dependency network graph viewer is based on Flash technology.
In the lack of Java Runtime, this Flash based viewer is thought to give service to users. On the other hand,
this viewer is capable of highlighting and showing the most nearest neighbors of selected nodes beside the
features like zooming, rotating and unique coloring of nodes.
Figure 3. The ZGRViewer powered dependency graph
In order to complete the purpose of decision tree based decision making and to serve the opportunities of
decision tree based prediction, ASMiner has a web-based online prediction tool. In fact, decision tree based
prediction is no more than hoping on the decision nodes with appropriate directions. At the last step of this
recursive process, the value of target variable (attribute) becomes clear or a distribution table is given at
worst case. In the case of regression trees, the value of target variable is calculated by the formula of decision
node. Two types of prediction queries such as batch and singleton exist in Analysis Services. However, only
singleton querying is supported by ASMiner.
Online prediction can be repeated many times to obtain best decision because it is an iterative process.
Fig. 4 shows the stages of ASMiner web-based prediction tool. In the first and second stages, decision maker
selects the predictable variable(s) and attach them with a required member of predict function family. Pure
predict() function results the value of target variable. On the other hand, predict-support() function returns
the support value of the predicted target variable. In the third stage, to make a prediction, decision maker
must enter the case that will be predicted. Thus, in this stage, the input variables are entered. In the last stage,
results are taken on the fly and evaluated by decision maker. If needed, predictable or input variables may be
changed with different combinations and overall scenario repeats itself until the decision maker is satisfied.
Figure 4. The stages in web-based prediction of ASMiner
ASMiner clustering subsystem focuses on describing and introducing discovered clusters in different
point of views. Majority of the viewers implemented in ASMiner clustering subsystem targets to inform the
users about characteristics, statistical differences and discriminations of clusters. Furthermore, a distribution
based cluster dominancy exploration method and viewer (Fig. 5) are empirically developed to gain insight on
that which clusters are highly dominant or recessive at the intersection of values of discrete variables in two
dimensional spaces.
ASMiner has six different types of clustering viewer implemented such as value distribution, cluster
distribution, general cluster profiles, specific cluster characteristics, cluster comparison and lastly cluster
neighborhood + distribution viewers. Moreover, ASMiner presents two new viewers used for value-variable
distribution and cluster distribution. By using these viewers, decision makers have the opportunity of having
statistical insights of clusters. In the cluster properties viewer, the properties of a selected cluster are listed in
decreasing support value.
Figure 5. Cluster dominancy distribution viewer
Association rules mining is one of the most important data mining tasks. For this reason, an association
rules mining module is implemented to ASMiner and three viewers are designed for this module. Itemsets
viewer, rules viewer and rules dependency network viewer constitute the association rules module of
ASMiner. ASMiner includes a comprehensive rules viewer. Unlike the other implementations of Apriori
algorithm, Analysis Services focuses on the Importance (namely lift) score for measuring the usefulness of
the rule (Maclennan et al., 2008). Rule importance score ranges between -1 to 1. Due to its potential
(1)
Select a
predictable
variable
(2) Attach a
suitable predict
function
(3) Input the
independent
variables of case
(4) Get the
prediction
results in a table
advantages, ASMiner focuses on the ways of filtering and saving the important rules that decision makers
require to report. Therefore, rule viewer is equipped with a minimum importance, minimum confidence and
textual search controls.
A web-based and flexible management system for administrators of system is developed for ASMiner.
By using this tool, user, roles, active mining models and the relationships among them can be managed. With
the help of Analysis Services’ AMO programming interfaces, model information can be retrieved and the
updates are directly reflected. Additionally, system administrator can specifically allow or ban the user(s) to
explore selected models. Finally, anyone as the user has right to access the model, he/she can be forbidden of
training or making predictions over it. Up to now, data mining oriented decision making processes have taken
too much time due to the barriers between decision makers and data mining experts. In addition, each change
during the generation of reports requires alternation in data mining models. For this reason, it results in new
loops between the decision makers and system administrators. However, this limitation may be decreased by
operations to be performed by the decision makers. This shows that the current approach in data mining
software packages assuming the target users as data mining expert. This approach is the fundamental barrier
to the common use of data mining. As the examples, researchers from different disciplines, officers of banks
and insurance companies, market managers and decision makers can use ASMiner easily. Decision tree based
risk assessment for all incoming requests can be carried out by these persons without needing a data mining
expert.
3. CONCLUSION
In this study, a web-based DSS namely ASMiner was developed. It is designed and implemented to take full
advantages of ultimate technologies in Internet and in DSS. In the designing stage, some viewers were
designed inspiring form the original Analysis Services viewers. For this reason, ASMiner can be assessed as
the web based version of Analysis Services. In addition, although Analysis Services presents some features
for connecting to itself on HTTP platform, ASMiner provides a pure three-tiered web-based data mining
platform. In addition, by considering AJAX based techniques and controls, the performance and user
interaction capabilities were enhanced. Due to the characteristics of ASMiner, it is possible to say that it has
some advantageous when compared with the other reporting and exploring tools used in practice.
As the further recommendation, extending the management capabilities on the data mining models and
enhancing the system for administrative usage is planned. In addition, another important point that is aimed
to implement in the future, is that supporting batch queries against live data sources. Furthermore,
implementing web-based naïve Bayesian model viewer and sequence clustering model viewer are the
important milestones in the development roadmap of ASMiner. Additionally, ASMiner would be fully
automated and more comprehensive web-based DSS for both decision makers and data mining experts.
REFERENCES
Ba, S. and Kalakota, A. B., 1995. Executable Documents DSS. Proc. 3rd International. Conference on DSS. Hong-
Kong.
Bhargava, H.K. and Power, D.J., 2001. Decision Support Systems and Web Technologies. AMCIS 2001 Proceedings.
Heinrichs, J.H. and Him, J., 2003. Integrating Web Based Data Mining Tools with Business Models For Knowledge
Management, Decision Support Systems, Vol. 35, No. 1, pp 103–112.
Maclennan, J. et al, 2008. Data Mining with SQL Server 2008. Wiley, Indiana Polis, USA.
Shim, J.P. et al, 2002. Past, Present, and Future of Decision Support Technology. Decision Support Systems, Vol. 4, No.
2, pp 111–126.
Visifire, 2009, Available: http://www.visifire.com