ArticlePDF Available

WEB MINING IN E-COMMERCE

Authors:

Abstract

Recently, the web is becoming an important part of people’s life. The web is a very good place to run successful businesses. Selling products or services online plays an important role in the success of businesses that have a physical presence, like a re
959
WEB MINING IN E-COMMERCE
Istrate Mihai
University of Pitești Faculty of Mathematics and Informatics No.1, Stadionului, Țicleni, Gorj, 215600, Romania
mihaifrance@yahoo.com +40 745 775 935
Recently, the web is becoming an important part of people’s life. The web is a very good place to run successful
businesses. Selling products or services online plays an important role in the success of businesses that have a physical
presence, like a retail business. Therefore, it is important to have a successful website to serve as a sales and marketing tool.
One of the effective used technologies for that purpose is data mining. Data mining is the process of extracting interesting
patterns from large databases. Web mining is the usage of data mining techniques to extract interesting information from web
data. This paper presents the three components of web mining: web usage mining, web structure mining and web content
mining and the main data preprocessing tasks for web usage mining.
Keywords: E-Commerce, Data mining, Web mining
JEL M
1. E-Commerce and Retail websites
In e-commerce instead of having your business in a limited physical place and a limited sector of customers who
are usually near to your store or business, you have it in the web. In e-commerce websites you have the ability to
sell, advertise, and introduce different kinds of services and products in the web. E-commerce websites have the
advantage of reaching a large number of customers regardless of distance and time limitations. Furthermore, an
advantage of e-commerce over traditional businesses is the faster speed and the lower expenses for both e-
commerce website owners and customers in completing customers transactions and orders.
Because of the above advantages of e-commerce over traditional businesses, a lot of industries in different fields
such as retailing, banking, medical services, transportation, communication, and education are establishing their
business in the web. But creating a successful online business can be a very difficult and costly task if not taking
into account e-commerce website design principles, web engineering techniques, and what e-commerce is supposed
to do for the online business. Understanding the requirements of both e-commerce website owner and customer is
an important aspect in building a successful e-commerce website. There is a lot of information need to be defined
before starting building the e-commerce website such as identifying business goals and how the website will target
those goals, if the website supposed to attract new customers or increase the sales of current customers, identify if
the proposed website will increase the business overall profit, and identify the most suitable tools and techniques
need to be used/followed in order to target those requirements.
Retail websites aim to inspire, reflect a good image about the business and improve it online. An important factor
in having a successful retail website is to know your competitors. On one hand, by identifying their points of
strongness and trying to get benefit of them by improving those strongness points and adopting powerful strategies.
On the other hand, identifying weakness points of your competitors and avoid them is a good practice in having a
successful retail website.
2.Web mining
The usage of data mining to maintain websites and improve their functionality is an important field of study.
Patterns extracted from applying data mining techniques on web data can be used to maintain websites by
improving their usability through simplifying user navigation and information accessibility and improving the
content and the structure of the website in a way that meets the requirements of both website owner and user which
will consequently increase the overall profit of the business.
Web mining is the use of data mining techniques to extract useful patterns from the web. Those extracted patterns
are used to improve the structure of websites, improve the availability of the information in the websites and the
way those
pieces of information are introduced to the website user, and to improve data retrieval and the quality of automatic
search of information resources available in the web. Web mining can be divided into three major categories: web
usage mining, web content mining, and web structure mining.
2.1 Web Usage Mining
Web usage mining or web log mining is the process of applying data mining techniques to web log data in order to
extract useful information from user access patterns. Web usage mining tries to make sense of the data generated
by the web user’s sessions or behaviors. The web usage data includes data from web server access log, proxy server
logs, browser logs, user profiles, registration data, cookies, and user queries. Web usage mining tries to predict user
behavior while user interacts with the web and learns user navigation patterns. The learned knowledge could then
be used for different applications such as website personalization, business intelligence, usage characterization and
adaptive websites. There are two approaches for web usage mining process:
960
- Mapping the log data into relational tables before an adopted data mining techniques is performed.
- Using the log data directly by utilizing special preprocessing techniques.
The Web usage mining process consists of three phases: data preprocessing, pattern discovery, and pattern analysis.
Pattern discovery is that set of methods, algorithms, and techniques used to extract patterns from web log file.
Several
techniques are used for pattern discovery such as statistical analysis, clustering, classification, and sequential
pattern mining. After patterns are discovered they need to be analyzed in order to determine interesting and
important patterns, besides the removal of redundant patterns. Pattern analysis has several different forms such as
knowledge query mechanism, visualization techniques, and loading usage data into a data cube in order to perform
Online Analytical Processing OLAP operations.
A web server log file records users transactions in the web. Usually, the web log file contains information about the
user IP address, the requested page, time of request, the volume of the requested page, its referrer, and other useful
information. The web log file can have different format, but there is a common log file format that is mostly used.
The common log file has the following format:
remotehost rfc931 authuser [date] "request" status bytes
where remotehost represents remote hostname (or IP number if DNS hostname is not available), rfc931 represents
the remote logname of the user, authuser represents the username as which the user has authenticated himself,
[date] represents dateand time of the request, ”request” represents the request line exactly as it came from the
client, status represents the HTTP status code returned to the client, and finally bytes represents the content-length
of the document transferred. The WWW Consortium (W3C) presented an extended format for web server log file
that is able to record a wide range of data to make an advanced analysis of the web log file. Web log file is the main
source of data analysis in web mining but a lot of preprocessing efforts need to be performed in order to prepare the
web log file to be mined.
2.2 Web Content Mining
Web content mining is mining the data that a web page contains. The contents of most of the web pages are texts,
graphics, tables, data blocks, and data records. A lot of research has been done to cover different web content
mining issues for the purpose of improving the contents of the web pages, improving the way they are introduced to
the website user, improving the quality of search results, and extracting interesting web page contents.
Web content mining is still a large field. It constains:
- structured data extraction;
- sentiment classification, analysis and summarization of consumer reviews;
- information integration and schema matching;
- knowledge synthesis;
- template detection and page segmentation;
A large amount of information on the Web is contained in regularly structured data objects which are data records
retrieved from databases. Such Web data records are important because they often present the essential information
of their host pages, lists of products and services.
Two of the most used methods for extracting structured data are wrapper induction (given a set of manually labeled
pages, a machine learning method is applied to learn extraction rules or patterns) and automatic extraction (given a
set of positive pages or given only a single page with multiple data records, generate extraction patterns).
2.3 Web Structure Mining
Links pointing to a document indicate the popularity of the document, whereas links coming out of a document
indicate the richness or the variety of topics covered in the document. Web structure mining describes the
organization of the content of the web where structure is defined by ”hyperlinks between pages and HTML
formatting commands within a page”.
Understanding the relationship between contents and the structure of the website is useful to keep an overview
about websites. One of the approach allows the comparison of web page contents with the information implicitly
defined by the structure of the website. In this way, it can be indicated whether a page fits in the content of its link
structure, and identify topics which span over several connected web pages. Thus supporting web designers by
comparing their intentions with the actual structure and content of the web page. Other studies deal with the web
page as a collection of blocks or segments. By partition the web page into blocks and by extracting the page-to-
block, block-to-page relationship from link structure and page layout analysis, a semantic graph can be constructed
over the WWW such that each node exactly represents a single semantic topic, this graph can better describe the
semantic structure of the web. Structure within a web page can be used to help machines understand pages.
961
3.Web Usage Mining Techniques
In this section, we discuss data mining techniques that are mostly used in web usage mining such as statistical
analysis techniques, clustering, classification, association rule mining, and sequential pattern mining.
Statistical analysis is the process of applying statistical techniques on web log file to describe sessions, and user
navigation such as viewing the time and length of a navigational path. Statistical prediction can also be used to
predict when some page or document would be accessed from now. It makes use of the N-grammer model which
assumes that when a user is browsing a given page, the last N pages browsed affect the probability of the next page
to be visited.
Clustering is the process of partitioning a given population of events or items into sets of similar elements. In web
usage mining there are two main interesting clusters to be discovered: usage clusters, and pages clusters. An
approach is to cluster web pages to have a high quality clusters of web pages and use that clusters to produce index
pages, where index pages are web pages that have direct links to pages that may be of interest of some group of
website navigators.
Classification is dividing an existing set of events or transactions into another predefined sets or classes based on
some characteristics. In web usage mining, classification is used to group users into predefined groups with respect
to their
navigation patterns in order to develop profiles of users belonging to a particularclass or category.
Association rule mining is the discovery of attribute values that occur frequently together in a given set of data.
Association rules mining techniques are used in web usage mining to find pages that are often viewed together, or
to show which pages tend to be visited within the same user session. A re-ranking method with the help of website
taxonomy is to mine for generalized association rules and abstract access patterns of different levels to improve the
performance of site search. Another approach for predicting web log accesses is based on association rule mining.
Association rule mining facilitates the identification of related pages or navigation patterns which can be used in
web personalization.
In sequential pattern mining a sequence of actions or events is determined with respect to time or other sequences.
In web usage mining, sequential pattern mining could be used to predict user’s future visit behaviors. Some web
usage
mining and analysis tools use sequential pattern mining to extract interesting patterns such as SpeedTracer and
Webminer.
4. Data Preprocessing for Web Usage Mining
Before data mining techniques are applied to web log file data, several preprocessing steps should be done in order
to make web log file data ready to be mined. Web log file contains data about requested URL, time and date of
request, method used, etc. The main data preprocessing tasks are data cleaning and filtering, path completion, user
identification, session identification, and session formatting.
Data cleaning is the first preprocessing task. It involves the removal or elimination of irrelevant items that are not
important for any type of web log analysis.Elimination of irrelevant items can be accomplished by checking the
suffix of the URL name to filter out requests for graphics, sound, and video hits in order to concentrate on data
representing actual page hits. For example, all log entries with filename suffixes such as gif, jpeg, and jpg can be
removed. Another cleaning process is removing log entries generated by web agents like web spiders, indexers, or
link checkers. Filtering out failed server requests, or transforming server error code is also done. Merging logs from
multiple servers and parsing the log into data fields is also considered a data cleaning step.
Path completion preprocessing task fills in page references that are missing due to local browsing caching such as
using the back button available in the browser to go back to previously visited page.
User identification is a complex step due to the existence of local caches, corporate firewalls, and proxy servers. If
the agent log shows a change in browser software, or operating system, a reasonable assumption to make is that
each different IP address in the log file represent a different user. If a page is requested that is not directly reachable
by a hyperlink from any of the pages visited by the user, a heuristic assume that there is another user with the same
IP address. Another assumption can be made is that consecutive accesses from the same host during a certain time
interval come from the same user. In some cases it is difficult to identify users, for example, when two users use the
same machine and the same browser with the same IP address and look at the same set of pages.
Session identification. A user session is defined as ”the set of pages visited by the same user within the duration of
one particular visit to a website”. Session identification is dividing the page accesses of each user into individual
sessions. One approach to identify user sessions, is by using a timeout threshold that is if the time between pages
requests exceeds a certain limit, then the user is starting a new session. Another approach assumes that consecutive
accesses within the same time period belong to the same session.
Session Formatting. A final preprocessing step could be formatting the sessions or transactions for the type of the
data mining technique, or algorithm to be applied. The Webminer, for example, formats the cleaned web server log
data in order to apply either association rule mining or sequential pattern mining.
962
5. Discussion
From previous, it is clear that making changes and adaptations to websites with the help of extracted patterns using
different data mining techniques is very effective, but doing that in the maintenance phase can be costly and time
consuming and suffers from different drawbacks. In commercial companies which are companies that sell different
kinds of products on the web, in order to make an effective maintenance to their websites, the companies have to
wait some period of time, for example one year, in order to have a representative log file that reflects customers
transactions in their website and can give a clear image about their behavior. This amount of time is considered
very big especially for the companies in which the time factor plays an important role in their success strategy, and
have many competitors who can attract their customers if they have no solid marketing strategies in order to keep
their customers as loyal as possible.
On the other hand, most businesses gather information about internet customers through online questionnaires. But,
many customers choose not to complete these questionnaires because of the amount of time required to complete
them as well as a lack of a clear motivation to complete them. Several companies use cookies to follow customers
through the WWW, but cookies are sometimes detected and disabled by web browsers and do not provide much
insight into customer preferences. This is because customers are feeling that their profiles are not secure so a
number of customers choose to give incorrect information about themselfs.
Furthermore, in web mining different strategies are implemented to identify sessions such as defining a time
threshold that a session should not exceed or assuming that consecutive accesses within the same time period
belong to the same session. In some cases, it is difficult to identify users, for example, when two users use the same
machine and the same browser with the same IP address and look at the same set of pages. We can conclude from
that, that those session and user identification strategies can not give a guarantee that those identified users and
sessions represent the actual users and sessions.
The problem of building an ill-structured website for some company/business can be solved by applying data
mining techniques such as clustering, classification, and association rule mining on the contents of the information
system of the company/business. Then, from the extracted patterns, the information needs to be considered in the
website building process is gained and invested during the design phase in the process of website design which
yields to a better designed retail website. The main advantage of this method is that it reduces maintenance time
and budgetary costs for websites if they are built taking into account the extracted interesting patterns from the
transactions database of the company/business. This approach also permits the sales manager to focus on the core
business and gives him a better view about his products and customers which is very helpful in designing retail
websites.
In conclusion, patterns extracted from applying web mining techniques on web data can be used to maintain
websites by improving their usability through simplifying user navigation and information accessibility and
improving the content and the structure of the website in a way that meets the requirements of both website owner
and user which will consequently increase the overall profit of the business.
References
1. S. Ananyan, M. Kiselev, Automated Analysis of Unstructured Texts
2. Michael Goebel, Le Gruenwald, A Survey of Data Mining and Knowledge
3. Stefan Conrad, Martin Mauve, Data Mining for Retail Website Design and Enhanced Marketing
4. Bing Liu, Web Content Mining
5. Christopher J. Hazard, Data Mining and Web Logs
6. Asem Omari and Stefan Conrad, Web Usage Mining for Adaptive and Personalized Websites
7.
Margaret H. Dunham, Data Mining Introductory and Advanved Topics
8.
Ruey-Shun Chen, Ruey-Chyi Wu, and J. Y. Chen, Data Mining Application in Customer Relationship
Management of Credit Card Business.
9.
A. McDonald and R. Welland, Web Engineering in Practice
10. Martin Ester, Hans-Peter Kriegel, and Matthias Schubert, Web Site Mining a New Way To Spot
Competitors, Customers and Suppliers in The World Wide Web.
11. M. Istrate, Web Content Mining.
... WM techniques offer undoubtedly technical advantages, cheap shortcuts and effective heuristics where costly marketing research devices would have been required or impossible to implement. Besides, traditional research methods are becoming increasingly more difficult to administer [3]. Surveys are perceived as being too long to complete, incurring high amounts of errors and biases that arise before, during and after the sampling and fieldwork processes. ...
... WM techniques offer a unique alternative. They avoid the hassle of primary research processes [3] and-handle large, complete, integer, reliable, cheap data in a time-effective manner, by generating refined knowledge from gross data. They may even leverage information that might not have been obtained with traditional primary research, the-holy grail of hidden information. ...
... WM suffers also from some limitations related to 'garbage in garbage out' issues; normative problems related to privacy; resource-consuming jobs and website reconfigurations which may not necessarily yield expected returns [3,20,[28][29][30]. ...
... In 2001, Shaw et al. [3] already observed that firms move online and marketing depends increasingly on the web for customers' data. Since the web is a large and ever-growing database, it is a cheap and fertile area for marketing research [4] via traffic data, clickstream data, e-commerce activities and search queries analyses [5], [6], [7]. In our information society, this is of utmost importance since knowledge is now businesses' new Holy Grail. ...
... Currently, the main issue is that customers happen to be fed up with marketing research, especially surveys. Both low response rates (hence difficulty to administer) and the multiplicity of methodological drawbacks (biases, error rates, respondents' inaccuracy or cheating, etc.) constitute the two faces of the same coin: marketing research limits [5], [10]. In addition to the garbage in garbage out issue, marketing research is also relatively expensive. ...
... WM refers to the whole of DM and related techniques that are used to automatically discover and extract information from web documents and services [12], [13], [14]. It refines large, complete, integer, reliable and cheap data into a time-effective manner [5]. It is thus a technology that is suitable for analytical tasks in large data sets, especially in CRM [15]. ...
Conference Paper
Full-text available
Web Mining (WM) remains a relatively unknown technology . However , if used appropriately , it can be of great use to the identification of existing customers’ behaviours online. The recent technical advances in the field of WM enhance tremendously the anal ytical side of Customer Relationship Management (CRM), still usually related to a simple transactional function. This study , follows an exploratory approach to assess whether WM fulfills, alone, all the three objectives of the second theme of Xu and Walton’s adapted aCRM framework for customer knowledge acquisition, namely the identification of existing web customers’ behaviour. It also investigates to what extent WM should be used in conjunction with traditional marketing research to optimize CRM, and hence marketing, in a web context. In-depth semi -structured interviews revealed that WM is very well suited to understand existing web customers’ transactional web behaviour(s) (navigation patterns, amount of purchases by week, by month, by region, cross -selling and up-selling opportunities, etc.) Nevertheless, WM does not well in understanding less obvious, underlying dimensions of customer behaviour , i.e., how existing customers develop satisfaction, loyalty , defection and attachment on the web. WM still needs to be complemented with traditional marketing research in order to reach those more difficult but essential aCRM objectives. Keyword: Web Mining, Knowledge Management, Customer Relationships Management (CRM), still usually related to a simple transactional function. This study , follows an exploratory approach to assess whether WM fulfills, alone, all the three objectives of the second theme of Xu and Walton’s adapted aCRM framework for customer knowledge acquisition, namely the identification of existing web customers’ behaviour. It also investigates to what extent WM should be used in conjunction with traditional marketing research to optimize CRM, and hence marketing, in a web context. In-depth semi -structured interviews revealed that WM is very well suited to understand existing web customers’ transactional web behaviour(s) (navigation patterns, amount of purchases by week, by month, by region, cross -selling and up-selling opportunities, etc.) Nevertheless, WM does not well in understanding less obvious, underlying dimensions of customer behaviour , i.e., how existing customers develop satisfaction, loyalty , defection and attachment on the web. WM still needs to be complemented with traditional marketing research in order to reach those more difficult but essential aCRM objectives.
... Its inductive approach and observational characteristics also overcome the hurdles usually associated with traditional survey-based marketing research e.g. low response rates, biases and errors (Mihai, 2009). ...
... Such knowledge is thus not accessible for prospects since they are not customers of the website (yet). It is therefore usually inferred from other sources such as browsing behaviour or clickstream data (Mihai, 2009). Currently, social media information and data, have created plenty of additional opportunities to buy and aggregate highly detailed knowledge about almost every aspect of a human's life (e.g. ...
Article
Full-text available
Research on how Web-Mining (WM) optimizes marketing, is sparse. Especially absent, is research on WM usefulness for Customer Relationship Management (CRM). The purpose of this research, is to propose a Web Mining-enabled knowledge acquisition framework for analytical CRM. An exploratory study consisting of eleven in-depth interviews with marketing scholars and practitioners revealed that, WM methods and techniques - currently available to practitioners - are well-suited for identifying the profile of web prospects according to their browsing behaviour and to classify them into homogeneous groups. Besides, the nascent technologies regarding opinion mining, sentiment analysis or natural language parsing, and which underlie WM, seem sufficient to acquire knowledge pertaining to attitudinal and other more psychometrically-based characteristics about web prospects. Such tools enable to better understand the so-often termed ‘elusive’ prospects, by crafting fine-grained online marketing strategies to acquire those wouldbe customers. The authors discuss the managerial implications that derive from these findings.
... The advantage of E-Commerce over traditional business is faster speed and lower expenses in completing customer transactions and orders. Because of these advantages the following industries in different elds such as retailing, banking, transportation, medical services, communication and education are establishing their business in the web [12]. Ÿ 100% business uptime Ÿ 24/7 services Ÿ Fast buying and selling procedure Ÿ Cost efciency Ÿ Global access Ÿ Quick response time. ...
... Up to now, PM has been used in diverse fields like retail, education or healthcare, thus showing wide application areas. For example, while [17] and [18] use PM for the maintenance of web pages and their optimization and improvement, [19] tries to achieve personalized learning based on students' performance data. Furthermore, [20] show how PM can be used to identify deviations in healthcare processes from existing policies and best-practices, whereas [21] show a possibility to use PM to discover the customer fulfillment process in a telecommunication company. ...
Conference Paper
Full-text available
The variety of data types generated in manufacturing environments leads to a situation where data-driven approaches for analytical maintenance support no longer have to be limited to the equipment level, but rather can be extended to further perspectives. To this end, this paper examines how process mining (PM) as an approach to extract knowledge about process-related relationships can be applied to support maintenance-related objectives. Our research is carried out by using exemplary data from a manufacturing company, where we successively take different data attributes from various source systems into account and apply selected PM techniques to demonstrate their applicability. As a result, we showcase how different insights can be provided, such as the analysis of a machine's internal behavior, examination of error dependencies across multiple production steps, determination of a machine's relevance within the equipment network or the discovery of bottlenecks regarding frequencies, cycle times and costs.
... The advantage of E-Commerce over traditional business is faster speed and lower expenses in completing customer transactions and orders. Because of these advantages the following industries in different elds such as retailing, banking, transportation, medical services, communication and education are establishing their business in the web [12]. Ÿ 100% business uptime Ÿ 24/7 services Ÿ Fast buying and selling procedure Ÿ Cost efciency Ÿ Global access Ÿ Quick response time. ...
Article
Full-text available
Electronic Commerce (E-Commerce) is an icon of communication, data management and security management that allows business application with different organization. It automatically exchanges information to the sales of goods and services. Like an operating system, it acts as an interface between seller and the buyer. In an internet world E-commerce industry plays a vital role in the WWW. E-commerce website becomes an unbeatable one and it shows the rapid development in it. Enterprises mainly focus on rebuild the relation with their old customer and simultaneously focussing on new customer. The effectual technology used in E-Commerce website is data mining and web mining. Data mining is the process of extracting immersing patterns from large database. Whereas, web mining is the usage of data mining techniques to extract interesting information from the web data. This paper mainly focuses on sentiment analysis about the most famous E-Commerce site in India Is Flipkart in the year 2015.Semantic based approach is used to nd the users opinionated phrases. In this paper, we try to elucidate the basics of e-commerce, the challenges of e-commerce in India and nally we made opinion analysis about the famous e-commerce company Flipkart.
... The usage of process mining for maintenance of web pages and improvement of their functionality is one of the significant research areas as presented in [29] and [30]. In paper [31] LTL (Linear Temporal Logic) Checker is shown, the language and tool that enable verification of properties on the basis of logs. ...
Conference Paper
Full-text available
This paper gives an overview of relevant research in the area of process mining. Process mining techniques are able to extract knowledge from event logs. The major objective of process mining is to discover, monitor and improve real processes. Process mining aims to exploit event data in a meaningful way to identify and anticipate problems, and recommend countermeasures. Additionally, process mining places the existing massive volumes of data in the context of processes. Since extracting data is an integral part of any process mining procedure, data preparation or data pre- processing requires certain efforts. Examples have been given to indicate how the chosen process mining technique deals with incompleteness in the event log data. Experiments have been made on the real data collected from information system for accommodation services.
... cheap holidays, high quality, etc.). This would be a function of the web mining project [4]. A destination can be proposed from customer's preferences and hence, the transportation may be derived from the customer's home address and the proposed destination. ...
Article
Full-text available
The MOVE-project aims at a general methodology and a reusable software framework to support the coordination and cooperation in virtual enterprises and to optimise individual business processes and required resources. This article examines the requirements for computer supported planning in virtual enterprises with a focus on optimisation of the partly contradictionary objectives satisfying customer and supplier interests. We present a first solution and its experimental evaluation. The concepts are demonstrated in a tourism scenario.
Article
Full-text available
Web Mining (WM) remains a relatively unknown technology. However, if used appropriately, it can be of great value to the identification of existing customers' behaviours online. The recent technical advances in the field of WM enhance tremendously the analytical Customer Relationship Management (aCRM), still usually related to a simple transactional function. This study follows an exploratory approach to assess whether WM fulfills, alone, all three objectives of the second theme of Xu and Walton's1 adapted aCRM framework for customer knowledge acquisition, namely the identification of existing web customers' behaviour. It also investigates to what extent WM should be used in conjunction with traditional marketing research to optimize CRM, and hence marketing, in a web context. In-depth, semi-structured interviews reveal that WM is very well suited to understand existing web customers' transactional web behaviour(s) (i.e. navigation patterns; amount of purchases by week, month, and region; and cross-selling and up-selling opportunities). Nevertheless, WM does not do well in understanding less obvious, underlying dimensions of customer behaviour, including how existing customers develop satisfaction, loyalty, defection and attachment on the web. WM still needs to be complemented with traditional marketing research in order to reach these more difficult but essential aCRM objectives.
Conference Paper
Research on how Web-Mining (WM) optimizes marketing, is sparse. Especially absent, is research on WM usefulness for Customer Relationship Management (CRM). The purpose of this research, is to propose a Web Mining-enabled knowledge acquisition framework for analytical CRM. Although many advances have been made in the realm of Natural Language Processing, Opinion Mining and other sophisticated web mining techniques, the results of this study show that Web-Mining is well-suited for identifying navigational characteristics of online prospects but remains limited regarding acquisition of knowledge relative to the motivations or perceptions underlying navigational patterns. Traditional marketing research remains a critical complement to Web-Mining. The authors also discuss the managerial implications that pertain to these findings.
Book
Full-text available
Introduction Introduction Related Concepts Data Mining Techniques Core Topics Classification Clustering Association Rules Advanced Topics Web Mining Spatial Mining Temporal Mining Appendix Index Salient Features Covers advanced topics such as Web Mining and Spatial/Temporal Mining. Includes succinct coverage of Data Warehousing, OLAP, Multidimensional Data, and Preprocessing. Concise coverage on distributed, parallel, and incremental algorithms. Provides case studies. Offers clearly written algorithms to better understand techniques. Algorithms are presented in a pseudocode. Includes a reference on how to use Prototypes and DM products.
Conference Paper
Full-text available
The World Wide Web is an important medium for communication, data transaction and retriev- ing. Data mining is the process of extracting in- teresting patterns from a set of data sources. Web mining is the application of data mining tech- niques to extract useful patterns from web data. Web Mining can be divided into three categories, web usage mining, web content mining, and web structure mining. Web usage mining or web log mining is the extraction of interesting patterns from web log server entries. Those patterns are used to study user behavior and interests, facili- tate support and services introduced to the web- site user, improve the structure of the website, and facilitate personalization and adaptive web- sites. This paper aims to explore various research issues in web usage mining and its application in the field of adaptive, and personalized websites.
Article
Full-text available
Data mining is considered as one of the most powerful technologies that participates greatly in helping companies in any industry to focus on the most important information in their data warehouses. Data mining explores and analyzes detailed companies transactions. It implies digging through a huge amount of data to discover previously unknown interesting patterns and relationships contained within the company data warehouses to allow decision makers to take knowledge based decisions and predict future trends and behaviors. Industries such as banking, insurance, medicine, and retailing commonly use data mining to reduce costs, enhance functionality, and increase sales. Web mining is the process of using data mining techniques to mine for interesting patterns in the web. Those patterns are used to study user behavior and interests, facilitate support and services introduced to the website navigator, improve the structure of the website, and facilitate personalization and adaptive websites. In this dissertation, we developed a new approach that measures the effectiveness of data mining in helping retail websites designers to improve the structure of their websites during the design phase. This is achieved by giving them valuable information about the retail’s information system, its elements, and the relationships between different attributes of the information system. When considering this information in the design phase of the retail websites, they will have a positive effect in improving the website design structure. Furthermore, this approach reduces maintenance efforts needed in the future. We also studied the behavior of items with respect to time. This approach is beneficial in Market Basket Analysis for both physical and online shops to study customers buying habits and product buying behavior with respect to different time periods. We showed how association rule mining can be invested as a data mining task to support marketers to improve the process of decision making in a retail business. This is done through exploring current and previous product buying behavior and predicting and controlling future trends and behaviors. Based on our idea that interesting frequent itemsets are mainly covered by many recent transactions, a new method to mine for interesting frequent itemsets is also introduced. Finally, to solve the problem of the lack of temporal datasets to run or test different association rule mining algorithms, we introduced the TARtool. The TARtool is a temporal dataset generator and an association rule miner.
Conference Paper
Full-text available
First, we classify the selected customers into clusters using RFM model to identify high-profit, gold customers. Subsequently, we carry out data mining using association rules algorithm. We measure the similarity, difference and modified difference of mined association rules based on three rules, i.e. emerging pattern rule, unexpected change rule, and added/perished rule. In the meantime, we use rule matching threshold to derive all types of rules and explore the rules with significant change based on the degree of change measured. In this paper, we employ data mining tools and effectively discover the current spending pattern of customers and trends of behavioral change, which allow management to detect in a large database potential changes of customer preference, and provide as early as possible products and services desired by the customers to expand the clientele base and prevent customer attrition.
Article
When automatically extracting information from the world wide web, most established methods focus on spotting single HTMLdocuments. However, the problem of spotting complete web sites is not handled adequately yet, in spite of its importance for various applications. Therefore, this paper discusses the classification of complete web sites. First, we point out the main differences to page classification by discussing a very intuitive approach and its weaknesses. This approach treats a web site as one large HTML-document and applies the well-known methods for page classification. Next, we show how accuracy can be improved by employing a preprocessing step which assigns an occurring web page to its most likely topic. The determined topics now represent the information the web site contains and can be used to classify it more accurately. We accomplish this by following two directions. First, we apply well established classification algorithms to a feature space of occurring topics. The second direction treats a site as a tree of occurring topics and uses a Markov tree model for further classification. To improve the efficiency of this approach, we additionally introduce a powerful pruning method reducing the number of considered web pages. Our experiments show the superiority of the Markov tree approach regarding classification accuracy. In particular, we demonstrate that the use of our pruning method not only reduces the processing time, but also improves the classification accuracy. General Terms Algorithms, Performance.
Data Mining and Web Logs
  • Christopher J Hazard
Christopher J. Hazard, Data Mining and Web Logs
Automated Analysis of Unstructured Texts
  • S Ananyan
  • M Kiselev
S. Ananyan, M. Kiselev, Automated Analysis of Unstructured Texts
Data Mining Application in Customer Relationship Management of Credit Card Business
  • Ruey-Chyi Ruey-Shun Chen
  • J Y Wu
  • Chen
Ruey-Shun Chen, Ruey-Chyi Wu, and J. Y. Chen, Data Mining Application in Customer Relationship Management of Credit Card Business.
Data Mining for Retail Website Design and Enhanced Marketing
  • Stefan Conrad
  • Martin Mauve
Stefan Conrad, Martin Mauve, Data Mining for Retail Website Design and Enhanced Marketing
A Survey of Data Mining and Knowledge
  • Michael Goebel
  • Le Gruenwald
Michael Goebel, Le Gruenwald, A Survey of Data Mining and Knowledge