ChapterPDF Available

Web Analytics Overview


Abstract and Figures

Web analytics is the technology and method for the collection, measurement, analysis and reporting of websites and web applications usage data (Burby & Brown, 2007). Web analytics has been growing ever since the development of the World Wide Web. It has grown from a simple function of HTTP (Hypertext Transfer Protocol) traffic logging to a more comprehensive suite of usage data tracking, analysis, and reporting. The web analytics industry and market are also booming with a plethora of tools, platforms, jobs, and businesses. The market was projected to reach 1 billion in 2014 with an annual growth rate more than 15% (Lovett, 2009). This chapter provides an overview of on-site web analytics, with a focus on categorizing and explaining data, sources, collection methods, metrics and analysis methods.
Content may be subject to copyright.
Web Analytics Overview
Guangzhi Zheng
Southern Polytechnic State University, USA
Svetlana Peltsverger
Southern Polytechnic State University, USA
Web analytics is the technology and method for the collection, measurement, analysis and
reporting of websites and web applications usage data (Burby & Brown, 2007). Web analytics has been
growing ever since the development of the World Wide Web. It has grown from a simple function of
HTTP (Hypertext Transfer Protocol) traffic logging to a more comprehensive suite of usage data tracking,
analysis, and reporting. The web analytics industry and market are also booming with a plethora of tools,
platforms, jobs, and businesses. The market was projected to reach 1 billion in 2014 with an annual
growth rate more than 15% (Lovett, 2009).
Web analytics technologies are usually categorized into on-site and off-site web analytics. On-site
web analytics refers to data collection on the current site (Kaushik, 2009). It is used to effectively
measure many aspects of direct user-website interactions, including number of visits, time on site, click
path, etc. Off-site analytics is usually offered by third party companies such as Twitalyzer
( or Sweetspot ( It includes data from other
sources such as surveys, market report, competitor comparison, public information, etc. This chapter
provides an overview of on-site web analytics, with a focus on categorizing and explaining data, sources,
collection methods, metrics and analysis methods.
Log files have been used to keep track of web requests since World Wide Web emerged and the
first widely used browser Mosaic was released in 1993. One of the pioneers of web log analysis was
WebTrends, a Portland, Oregon based company, which conducted website analytics using data collected
from web server logs. In the same year, WebTrends created the first commercial website analytics
software. In 1995, Dr. Stephen Turner created Analog, the first free log file analysis software. In 1996,
WebSideStory offered hit counter as a service for websites that would display a banner. Web server logs
have some limits in types of data collected. For example, they could not provide information about
visitors' screen sizes, user interactions with page elements, mouse events such as clicking and hovering,
etc. The new technique of page tagging is able to overcome the limitation and gets more popular recently.
The fundamental basis of web analytics is collection and analysis of website usage data. Today,
web analytics is used in many industries for different purposes, including traffic monitoring, e-commerce
optimization, marketing/advertising, web development, information architecture, website performance
improvement, web-based campaigns/programs, etc. Some of the major web analytics usages are:
1. Improving website/application design and user experience. This includes optimizing website
information architecture, navigation, content presentation/layout, and user interaction. It also
helps to identify user interest/attention areas and improve web application features. A particular
example is a heat map that highlights areas of a webpage with higher than average click rate and
helps determine if intended link/content is in the right place.
2. Optimizing e-Commerce and improving e-CRM on customer orientation, acquisition and
retention. More and more companies analyze website usage data in order to understand
customers' needs to increase traffic and ultimately increase their revenue. Different sites can have
Manuscript only published in
Encyclopedia of Information Science
and Technology, Third Edition, IGI
different goals like selling more products and attracting more users to generate more income
through advertisements. Websites want to keep visitors longer (reducing bounce rate) to
encourage users to return and to make every visit end with completion of targeted action
3. Tracking and measuring success of actions and programs such as commercial campaigns. To
bring value, web analytics must differentiate between a wide variety of traffic sources, marketing
channels, and visitor types. A common question is: where did visitors learn that information?
For example, parameters used in tracking direct traffic from email, social media, or mobile
devices allow correlation of traffic sources with marketing campaign cost, which helps to
evaluate return on investments.
4. Identifying problems and improving performance of web applications. The study performed by
Tag Man shows a significant correlation between page-load time and the likelihood of a user to
convert (TagMan, 2012). Web analytics helps to address this issue. Page loading metrics such as
average page load time by browser and geographic location are used to measure performance.
Both real-time and historical performance analysis allow proactive detection, investigation, and
diagnosis of performance issues. Improvements may range from simple image optimization to
modification of the expiration date in the HTTP headers to force browsers to use cached website
content. A heat map might help to reveal website errors, such as that users click on buttons or
images without links. The same techniques can be used by developers of web based applications
and games to add/modify software features.
Data and Sources
The fundamental goal of web analytics is to collect and analyze web traffic and usage patterns. A
common way to study this data is to use the dimensional model (Hu & Cercone, 2004). Under this model,
there are two major types of data: facts or measurement data and dimensional data that describe facts
from different aspects and levels. Facts data are mainly about usage count and time. The most basic
measure is a page view, which is a single request for a web page. Count of user actions such as mouse
clicks can also be used as a measure. Various metrics are calculated based on basic measures and
dimensions. Dimensional data are much more complex. Major types of dimensions include time, content,
location, user client information (such as operating system, browser type, screen size, etc.), and user or
Both measurement data and dimensional data come from a number of sources, which can be
categorized into the following 4 types:
1. Direct HTTP request data
2. Application level data sent with HTTP requests
3. Network level and server generated data associated with HTTP requests.
4. External data
Direct HTTP request data directly come from HTTP request messages. An HTTP request is a
message sent by a web client (browser) to a web server to request a resource (a web page or a web page
element like an image). Traditionally, web traffic measurement is directly based on web resource visits
(commonly called page view). Then each request is further described by a number of dimensions, such as
page, visitor, technology, etc. The format of the HTTP 1.1 request is specified in IETF RFC 2616
(Fielding, Gettys, & Mogul, 1999). A typical HTTP request message is shown in Figure 1.
Figure 1: HTTP request header sample displayed with Chrome v22
An HTTP request consists of a request command (the first line) and HTTP headers. The request
command includes the required URI (unified resource identifier) information. A URI generally includes a
host's domain or IP and a directory path. If the host information is not included as a part of the URI, then
the “host” header has to be provided. The URI is the key information that leads to the count of a
page/resource views. HTTP headers are pairs of field names and values. HTTP 1.1 specification defines a
set of headers that can be included. These headers describe request and client characteristics. Most of the
header data are dimensional type of data used in web analytics. Some commonly used header fields for
tracking are:
User-Agent field holds client information such as browser type and operating system type. This
information can be used to profile client technologies.
Referer (not “referrer”) field keeps the previously visited URL that leads to the current URL. This
header can be used for the clickstream analysis where user visiting paths can be constructed by
chaining a serial of requests. It also can be used for metrics like entry rate, exit rate, etc.
Accept-Language field contains the list of natural languages that are preferred in the response.
The list is determined based on the OS default locale. This can be used to track user’s language,
e.g. en, en-US, es (Spanish), zh-cn (China).
Cookie field holds application level information stored at the client side. This can hold various
kinds of data that is beyond HTTP’s role, such as keyboard and mouse actions.
Application level data is generated and processed by application level programs (such as
JavaScript, PHP, and ASP.Net). Some common examples are:
Session data identify a client interaction with a website consisting of one or more related requests
for definable unit of content in a defined time period (Burby & Brown, 2007). HTTP itself is
stateless and cannot provide session information. Thus, this data is managed at the application
level. Session data are usually sent as URL parameters or session cookies. They are important for
calculate metrics like number of visits, time on site, number of page views per visit, etc.
Referral data is different from the referer header in HTTP requests. HTTP referer is at the page
request level and is usually a URL. Application level referral represents different sources leading
to the current web resource and is usually a coded value. It can be used to analyze traffic levels
from expected and unexpected sources, or to gauge channel effectiveness in advertisement
Request command to get
User action data mainly include keyboard actions (e.g. user input of search terms) and mouse
actions (e.g. cursor coordinates and movements). It also includes application specific action such
as voting, playing of video/audio, bookmarking, etc.
Client/browser side data include computer status information like display resolution and color
depth, or any other information a user chooses to make available.
Application level data is usually embedded in HTTP requests. There are three common places to
hold this information. First, they can be appended to a request URL as URL parameters. Server side
programs can parse these parameters. For example, Google uses specifically constructed URLs in their
search results to redirect users to the target while capturing extra information (Figure 2). Second,
application data can be sent as the HTTP cookie header. Cookies are small text files that usually store
user profile and activity data. The type of data that can be stored is directly determined by the client
software and settings (Tappenden & Miller, 2009). Third, application data can also be included in the
HTTP request body when an HTTP “POST” method is used (common for form submission).
Figure 2: Google uses a transmission URL when redirecting a link to an external target
Network level data is not part of an HTTP request, but it is required for successful request
transmissions. The most prominent example is an IP address of a requestor. The requester's IP address
and port number are required in order to return a response. This information is sent at the TCP/IP level
and is logged by a web server. Server generated data is usually used for internal reference and is recorded
in server log files. The log file commonly records file size, processing time, server IP, request events
other than HTTP request, etc. (see the next section for more details).
External data can be combined with on-site data to help interpret web usage. For example, IP
addresses are usually associated with Geographic regions and internet service providers. Third party
databases or services provide such mappings, e.g., MaxMind’s GeoIP and GeoLite
(, IPInfoDB (, GeoBytes (, and ( Another example is user information that was collected and stored
during a separate process (e.g. registration). If user identity information is required in a visit, then this
profile data can be associated with usage data. Revenue and profit can be classified as external data if
Additional data is appended to the URL. These data are
captured by Google when a user clicks on the link in
Google search results. is
replaced with
they can be associated with particular webpages. Search terms and advertisement keywords requests are
also external data and are usually provided by third party services.
Table 1: Web Analytics Major Data and Source Summary
Page view
Client profile/User-Agent (browser, OS)
User action (keyboard and mouse)
Geo location
Visit or session
Measurement, dimension
Referrer (preceding webpage)
Referral (channel identification)
Client profile (screen size, color depth)
IP address
User profile
Revenue or profit
Collection and tracking methods
There are two major methods to collect usage data: web server logging and page tagging.
Web server logging is a traditional method of usage data collection. A log file is generated by a
web server to record server activities and HTTP headers in a textual format. There are various formats of
log files. Most commonly logged data in the NCSA Common Log Format
( are server IP, date/time, HTTP request
command, response status, and response size. Figure 3 shows an example of the Common Log Format
implemented in Apache Web Server 2.2. Additional data, such as HTTP headers, process id, scripts,
request rewrite, etc., can be logged in proprietary formats or Extended Log File Format
( Log analysis software can be used to extract and analyze log
files. Popular tools are Analog (, Deep Log Analyzer (http://www.deep-, Webalizer (, and AWSstats (
Figure 3: Common Log Format Example in Apache Web Server 2.2
The second and more recent method uses client side programs such as embedded scripts, browser
add-ons and plug-ins. For example, in a typical JavaScript tracking method, a piece of JavaScript code - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0"
200 2326
LogFormat "%h %l %u %t \"%r\" %>s %b" common
CustomLog logs/access_log common
Log format configuration in
Apache Web Server
An example log entry produced
by the configuration above
included in a page tracks user activity and stores information in a cookie. The information is sent to a
processing server (not necessarily the same server that hosts the website) using web beacons or web
services. This method is commonly used by third party service providers such as Google Analytics and
Open Web Analytics. For many organizations, it has become a major type of web usage data collection.
Web server logging is less invasive and does not require page modifications (Ganapathi & Zhang,
2011). Compared to the web server logging method, page tagging has a number of advantages (Clifton,
2012). First, client scripts may have access to additional information about the client such as computer
screen size and color depth. Second, JavaScript can track client side user actions or events such as
keyboard pressing and mouse clicking. This is particularly useful in today’s context of rich internet
applications (RIA). RIAs support many client side user interactions that do not communicate with the
server; therefore server side logging cannot track these actions. Last but not least, data management and
reporting become simpler as many of these services are provided through a Software-as-a-Service (SaaS)
model without local maintenance. This is a preferred method for small and medium websites.
A third method of data collection, application level logging, is on the rise lately. Application level
logging is tightly coupled with an application, which is a functional feature of the application itself. This
is an expansion of the traditional web analytics which focuses on generic HTTP requests and user actions.
An application can be a shopping site, a web portal, a blog service, a learning management system, a
forum, or a social networking service. Each of these applications has its own unique usage data that is
collected beyond generic web requests or user actions. The usage data is processed by the application
itself or by a functional module tightly coupled with the application, but not by independent logging or
analytics services. For example, SharePoint 2010 provides framework specific analytics data, like usage
of templates and web parts (Zampatt, 2011).
Common analyses and reports
Meaningful and measurable metrics must be defined in order to analyze web traffic and relate it
to business goals. The most common traditional metrics used in web analytics are (Kaushik, 2009):
Visit count: page view, visit, unique visitor.
Visit duration: time on page, time on site.
Bounce rate and exit rate.
The most basic analysis is the dimensional analysis involving measures and dimensions. The
basic metrics mentioned above and other derived metrics are aggregated by dimensions at different levels.
For example, we can use dimensional analysis to answer the question: “what are the total visits by month
(or day of the week) and by website sections (or page)?" Dimensional analysis is the fundamental piece of
other analyses and reports. Most common types of analyses include:
Trend analysis looks at data along the time dimension and shows the chronological changes of
selected metrics. For example, data can show how the percentage of mobile client access has changed for
the past two years.
Distribution analysis is about metric value breakdown. Values are usually calculated as
percentages of the total by one or more dimensions. It is often used to analyze visitor and client profiles.
For example, the percentages of browser types for the past month give information about client diversity.
Other commonly used dimensions in this type of analysis are traffic source (e.g. referral source analysis
reveals the campaign effectiveness), location, technical data that includes information about browser, OS,
device, screen resolution and color depth, client technology support, etc.
User activity or behavior analysis analyzes how users interact with websites. Typical examples
are engagement analysis, clickstream analysis, and in-page analysis.
Engagement analysis is one of the most frequently used analyses in the industry. It measures the
following factors:
How many pages were visited per session?
What is the duration of a visit?
How often new visitors become returning visitors?
How often visitors return to the site (loyalty)?
The goal of visitor engagement analysis is to find out why the multitude of operations performed
on a website did not end in conversion. There were several attempts to create engagement calculators that
will distinguish between user visits. For example, one user came from Google search, visited two pages in
five minutes and downloaded necessary document. Another user came from the main site, visited twenty
pages in 40 minutes, downloaded five documents (Peterson & Carrabis, 2008).
Clickstream analysis, also known as click paths, analyzes the navigation path a visitor browsed
through a website. A clickstream is a list of all the pages viewed by a visitor presented in the viewing
order, also defined as the "succession of mouse clicks" that each visitor makes (Opentracker, 2011).
Clickstream analysis helps to improve the navigation and information architecture of websites.
Visitor interest/attention analysis (in-page analysis) analyzes users’ attentions on a web page. It
uses client script to track user mouse movements and clicks, and shows results in a heat map. It can also
show how far down visitors scroll the page. Analysis of link popularity and areas of attention helps to
develop content placement strategies. For example, it helps determine what navigational items should be
placed on the top of the page or find the best places for advertisements.
Conversion analysis is one of the key analyses in e-commerce and other sectors. Conversion rate
is calculated by dividing the number of completed targeted actions (e.g. purchases) by the number of
unique users visited the site. All web analytics providers strive to improve conversion tracking. For
example, Google Analytics provides Multi-Channel Funnels conversion reports that show what
campaigns, sources, or channels have contributed to a visitor's multi-visit conversion.
Performance analysis helps reveal website performance issues (such as loading time) or linking
errors. For example, after a website redesign, indirect traffic volume needs to be watched. If there is less
indirect traffic, then some links from other sites and/or bookmarks were potentially broken after the
Privacy and data accuracy are two major issues and concerns of web analytics. In most cases,
these two issues are related. Many privacy settings affect data tracking and collection accuracy. Concerns
about personal privacy have been rising since web analytics became commonly adopted. The use of
cookies is a major issue in accuracy and privacy concerns. Cookies may contain privacy information that
users do not what to share. For example, in a web beacon tracking method, cookies are used to track
customer behavior across different websites. A web beacon is a piece of third party tracking code
embedded in a webpage. The same provider collects data, reads cookies, and tracks user behavior across
several domains and websites. As soon as the first web beacon is displayed on a system, a unique number
is generated and saved in a cookie file on the user's system. When the user visits another website with
web beacon from the same provider, the provider reads the cookie and aggregates user's data and can
customize what advertisement to be displayed for this user.
Web analytics largely depends on the use of cookies for data collection and transmission to the
server. If cookies are blocked at the client side, then part of the information is missing and will affect the
accuracy of web traffic and usage. There are several ways that users can manipulate client application
settings to protect their privacy. All major current browsers provide an easy way to delete cookies and
prevent third party cookies, first party cookies, or scripting altogether. Users can choose to use the private
browsing mode in all three major browsers (Incognito in Google Chrome, InPrivate in Internet Explorer,
and Private Browsing in Firefox). To standardize privacy and tracking controls, W3C recommended the
use of DNT (Do Not Track) HTTP header ( A
DNT header is a user preference set in a browser. Both the web server and the client JavaScript can read
the setting and should not track the user when the DNT option is explicitly set to true. However, this does
not force websites to comply, and the service provider may decide not to honor users' choice. For example,
when Microsoft decided to set DNT setting to true by default in IE 10, Yahoo announced that it would
ignore IE 10’s DNT settings (Schwartz, 2012).
Another issue is identification of sessions and users. A visit/session can consist of multiple user
actions and requests. However, HTTP protocol is a stateless protocol, which makes each request and
response independent and not related to prior or later requests. This poses difficulty when we want to
correctly identify behavioral patterns. Sessionization is an attempt to group requests from each user over a
period of one visit. The configuration and definition of sessions will affect the accuracy of metrics like
number of visits. The only way to receive accurate visit statistics is to generate new session when a user
logs in and to terminate the session after the user logs out or stays idle for a period of time.
A common approach to identify visits is to use IP addresses. But this is not always possible. If
visitors come from the same organization and their network uses Port Address Translation, some visitors
will be identified by the same public IP address. On the contrary, if a user changes the IP address during
the session, a visit can be incorrectly counted as multiple visits. Cookies are also used to identify visits,
but as mentioned before, cookies can be deleted and blocked for privacy protection. That impacts the data
accuracy as well.
Web browser and proxy caching influence the accuracy of log file analysis. Caching is important
for user experience and effective use of resources. However, it changes host and visit tracking data. If a
proxy is used, then content might be cached and reused for subsequent user visits.
Other issues may include tracking code configuration and setup, incorrect setting of tracking
codes, especially in page tagging methods. Some factors include missing tags and improper placement of
tags. JavaScript, AJAX in particular, is used to create more dynamic, more powerful, and easier to use
websites. However, a browser delays rendering any content that follows a script tag until that script has
been downloaded, parsed and executed. This delay skews the user engagement statistics.
Web 2.0 has brought many changes to the Web analytics industry. AJAX changed how users
interact with websites, and the future analytics will be more focused on event data rather than just based
on HTTP requests. This made page tagging method a dominant collection method for the future. Mobile
web has also become a major trend in the last two years (Meeker, 2012). However, there are several
challenges for measuring mobile web access (Rapoza, 2010). For example, JavaScript is poorly handled
by many mobile browsers and collected statistics are not very reliable. Therefore, there is a need for more
robust method of mobile web data collection and analysis.
Higher application level analytics will not only collect generic HTTP request data or user action
data, but also domain and application specific data. Web analytics traditionally was used for e-commerce
sites, but recently expanded into other areas such as social media and education. The collection and
analysis of such application level data is usually labeled using application names, like learning analytics,
video analytics, search analytics, social media analytics, etc. For example, Google provides search and
advertising analytics; YouTube provides video analytics; LinkedIn and Facebook provide social analytics;
Blackboard provides learning analytics. Most of these application specific analytics combine on-site web
usage data and external data. This trend will continue with introduction of more application specific
Diversity of client systems and expansion of data sources led some providers to replace the term
web analytics with digital analytics. It's no longer just about measuring website usage but instead
understanding the entire digital footprint of users (Stanhope, 2012). The web usage has become part of a
larger digital usage (e.g. mobile devices, smart TV, etc.). Realizing this change, Web Analytics
Association ( has renamed itself to Digital
Analytics Association in March 2012 to account for the analyst's changing role of combining data from
multiple sources and channels.
Web analytics is a field of web traffic data collection and analysis. It had gained wide adoption
and become one of the important tools to help web application management and business analysis. With
the recent Web 2.0 and cloud service advancements, it has quickly evolved from simple system level data
logging to more comprehensive information collection and analysis. With the continuing expansion of
data sources, Web/digital analytics will play an even more important role in the future.
Burby, J., & Brown, A. (2007, August 16). Web Analytics Definitions - Version 4.0. Retrieved from
Clifton, B. (2012). Advanced Web Metrics with Google Analytics (3rd ed.). Indianapolis, IN: John Wiley
& Sons.
Fielding, R., Gettys, J., & Mogul, J. (1999). Hypertext Transfer Protocol -- HTTP/1.1. Retrieved from
Ganapathi, A., & Zhang, S. (2011). Web Analytics and the Art of Data Summarization. In Managing
Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning
Techniques (pp. 6:16:9). New York, NY, USA: ACM.
Hu, X., & Cercone, N. (2004). A Data Warehouse/Online Analytic Processing Framework for Web Usage
Mining and Business Intelligence Reporting. International Journal of Intelligent Systems, 19(7),
Kaushik, A. (2009). Web Analytics 2.0: The Art of Online Accountability and Science of Customer
Centricity (1st ed.). Indianapolis, IN: John Wiley & Sons.
Lovett, J. (2009). US Web Analytics Forecast, 2008 To 2014. Cambridge, MA: Forrester Research.
Meeker, M. (2012). Internet Trends. Retrieved from
Opentracker. (2011). Glossary. Retrieved December 15, 2012, from
Peterson, E., & Carrabis, J. (2008). Measuring the Immeasurable: Visitors Engagement. Web Analytics
Demystified. Retrieved from
Rapoza, J. (2010, December 2). Web Analytics: A New View. InformationWeek. Retrieved from
Schwartz, M. J. (2012, October 30). Yahoo To Ignore IE10 DNT Settings. InformationWeek. Retrieved
Stanhope, J. (2012, January 1). The new face of Web analytics. KMWorld Magazine, 21(1). Retrieved
TagMan. (2012, March 14). Just One Second Delay In Page-Load Can Cause 7% Loss In Customer
Conversions. Retrieved from
Tappenden, A. F., & Miller, J. (2009). Cookies: A Deployment Study and the Testing Implications. ACM
Trans. Web, 3(3), 9:19:49.
Zampatt, G. (2011, September). SharePoint Best Practices Creating and Configuring Service Applications
With (and Without) PowerShell, Part2. The SolidQ Journal, 13. Retrieved from
Overview and history
McManus, S. (2004). Count on Me: an Introduction to Web Analytics. Retrieved from
ClickTale (2010). A Brief History of Web Analytics. Retrieved from
Dems K. (2010). A Brief History of Web Analytics. Retrieved from
Ballardvale (2004). Market Trends - Web Analytics: History and Future. Retrieved from
Wikipedia (2013), Web Analytics. Retrieved from
Kaushik A. (2014). Occam’s Razor. Retrieved from
Clicktable (2014). Web Analytics & usability Blog. Retrieved from
GetElastic (2014). Web Analytics Blog. Retrieved from
Clifton, B. (2014) Measuring Success - the blog. Retrieved from http://www.advanced-web-
Report and stats
Stanhope, J., Frankland, D., & Dickson, M. (2011). The Forrester Wave™: Web Analytics, Q4
2011. Forrester Research. Retrieved from
Stanhope, J., Frankland, D., & Dickson, M. (2012). Welcome To The Era Of Digital Intelligence.
Forrester Research.
KISSmetrics (2011). The 2011 Web Analytics Review. Retrieved from
Companies and tools
TopTenReviews (2014). 2014 Web Analytics Product Comparisons. Retrieved from http://web-
WikiPedia (2014). List of web analytics software. Retrieved from
Google Analytics (2014). Retrieved from
WebTrends (2014). Retrieved from
ClickTale (2014). Retrieved from
Open Web Analytics (2014). Retrieved from
WASP (2014). Retrieved from
IBM Digital Analytics (2014). Retrieved from http://www-
Digital Analytics Association (2014). Retrieved from
Web Analytics Wednesday (2014). Retrieved from
Kaushik, A. (2007). Web analytics: an hour a day. Indianapolis, IN: Sybex.
Mashable (2014). Web Analytics. Retrieved from
Beyond Web Analytics (2014). Retrieved from
Cookie: a small text file stored at the client side to record additional information that may be
shared by multiple requests and responses.
Digital analytics: an expansion of web analytics to include data from other sources.
Dimension: an attribute or a perspective to describe measures.
HTTP: the application level data transfer protocol for web applications.
HTTP request: a message sent from a client to a web server to request resources.
Metric: a key indicator of an objective we want to measure and track
Web analytics: the technology and method for the collection, measurement, analysis and
reporting of websites and applications usage data
Web log: a text file generated by a web server to record server activity and communication data.
... Web analytics refers to the collection, analysis, and reporting of internet data for the purposes of understanding and optimizing web use [18]. On-site web analytics are used to measure direct user interaction, such as the number of visitors, time spent on a website, and click path [19]. Overall, web analytics can contribute to determining a website's usability and conversion rates [1]. ...
... An engagement analysis evaluates user activity and is one of the most used analytic tools. It describes how users interact with websites [19]. Factors that are often addressed include how often visitors return to the site, how often new visitors become returning visitors, pages visited per session, and duration of visits [19]. ...
... It describes how users interact with websites [19]. Factors that are often addressed include how often visitors return to the site, how often new visitors become returning visitors, pages visited per session, and duration of visits [19]. ...
Full-text available
With the growing importance of communicating with the public via the web, many industries have used web analytics to provide information that organizations can use to better achieve their goals. Although the importance of health care websites has also grown, the health care industry has been slower to adopt the use of web analytics. Web analytics are the measurement, collection, analysis, and reporting of internet data used to measure direct user interaction. Our objective is to provide generalized methods for using web analytics as key performance metrics to evaluate websites and outline actionable recommendations for improvement. By deconstructing web analytic categories such as engagement, users, acquisition, content, and platform, we describe how web analytics are used to evaluate websites and how improvements can be made using this information. Engagement is how a user interacts with a website. It can be evaluated using the daily active users to monthly active users (DAU/MAU) ratio, bounce rate, pages viewed, and time on site. Poor engagement indicates potential problems with website usability. Users pertains to demographic information regarding the users interacting with a website. This data can help administrators understand who is engaging with their website. Acquisition refers to the overall website traffic and the method of traffic, which allows administrators to see how people are accessing their website. This information helps websites expand their methods of attracting users. Content refers to the overall relevancy, accuracy, and trustworthiness of a website's content. If a website has poor content, it will likely experience difficulty with user engagement. Finally, platform refers to the technical aspects of how people access a website. It includes both the internet browsers and devices used. By providing detailed descriptions of these categories, we have identified how web administrators can use web analytics to systematically assess their websites. We have also provided generalized recommendations for actionable improvements. By introducing the potential of web analytics to augment usability and the conversion rate, we hope to assist health care organizations in better communicating with the public and therefore accomplishing the goals of their websites.
... Each new page transition is a request to the web server. Log files have been used to keep track of web requests, also called hits, since the WWW appeared, and the first browser Mosaic was released in 1993 [17]. There are various methods for collecting and storing user access data. ...
... While the data storage phase occurs on the servers (on-site or off-site), the data-obtaining step can be performed via client-side or server-side web technologies. Application-level user access data is commonly obtained using JavaScript (JS) snippets by the clientside page tagging method, and the logs are stored on remote servers off-site [17]. However, this method is unsuitable for organizations that prioritize freely applying web mining techniques to their data. ...
Full-text available
The underlying data source for web usage mining (WUM) is commonly thought to be server logs. However, access log files ensure quite limited data about the clients. Identifying sessions from this messy data takes considerable effort, and operations performed for this purpose do not always yield excellent results. Also, this data cannot be used for web analytics efficiently. This study proposes a method for user tracking, session management, and collecting web usage data. The method is mainly based on an innovative approach for using collected data for web analytics extraction as the data source in web usage mining. An application-based API has been developed with a different strategy from conventional client-side methods to obtain and process log data. The log data has been successfully gathered by integrating the technique into an enterprise web application. The results reveal that the homogeneous data collected and stored with this method is more convenient to browse, filter, and process than web server logs. This structured data can be used effortlessly as a reliable data source for high-performance web usage mining activity, real-time web analytics, machine learning algorithms, or a functional recommendation system. Ó 2023 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (
... It is important to note that audience metrics should be distinguished from related terms such as web metrics and audience feedback. While web metrics and audience metrics are sometimes used interchangeably (usher 2013), the former encompasses a broader scope, including not only audience metrics but also factors like web page performance, such as loading time (Zheng and Peltsverger 2014). In contrast, audience feedback is a narrower concept that focuses specifically on audience reactions to content (Lee and Tandoc 2017). ...
In recent decades, significant transformations in audience characteristics and the media environment have necessitated a reassessment of audience analysis. Communication scholars have increasingly recognized the value of utilizing digital traces as valuable resources to understand audience behaviors. This research presents a comprehensive review of 243 audience analyses that incorporate digital traces, covering the period from 2001 to 2022, as published in 19 prominent communication journals. Our analysis reveals a remarkable expansion in the variety of data sources and a diversification of research contexts within the field. The integration of digital traces has empowered researchers to enhance behavioral concepts and attain deeper insights into audience dynamics. By harnessing the temporal, semantic, and structural information embedded within digital traces, novel audience metrics have been developed. This review identifies noteworthy theoretical and methodological implications for future audience research, emphasizing the necessity to embrace the evolving landscape of digital media. Furthermore, it suggests avenues for further exploration and the refinement of existing methodologies. By capitalizing on the potential of digital traces, communication scholars can continue to advance our understanding of audience characteristics and behaviors in the everchanging media ecosystem.
... Duncan (2010) describes the humble beginnings of web analytics from server error logs employed by IT professionals to evaluate the user interface of websites to the introduction of enterprise web analytics by the year 2000, which were able to be deployed by non-technical professionals, especially marketing departments who utilized them to understand the audiences that they were speaking to. Zheng and Peltsverger (2015) show the growth from those humble beginnings to today's usage of web analytics in a variety of industries for different purposes that range from e-commerce optimization and website performance improvement to traffic monitoring. ...
Full-text available
Web analytics, which reference the collection and analysis of information via internet sites, are being increasingly adopted by media organizations in carrying out research on audiences. It is unclear if this same trend is visible in academia despite the adoption of online tools to carry out research. Web analytics provide a view of audience behavior online and may prove relevant in excavating data about audiences and their preferences in media and communication research. It is, however, important to understand the perception of media and communication researchers to this audience research method. The study utilized the survey method to investigate four research hypotheses on the adoption of web analytics, perceptions on the relevance of data derived from web analytics and perception of reliability and trustworthiness of web analytics as an audience research method. The results indicate that media and communication researchers in UNILAG were significantly familiar with web analytics, and perceived it as a trustworthy source for data gathering.
... The monitoring leads to better organization of promotion, better adjustment of the offer and form of presentation to the needs of the recipients. Nowadays, web analytics can be used in many business areas and for different purposes, including traffic monitoring, e-commerce optimization, marketing/advertising, web development, information architecture, website performance improvement, web-based campaigns/programs (Zheng and Peltsverger, 2015). ...
Full-text available
Purpose: The main aim of the article is to know the information needs of candidates for university courses and indicate the importance of web analytics tools in the university recruitment process. The authors present the recruitment process for data science high study programme that was conducted in the middle of 2021 at one of the biggest universities in eastern Poland. Theoretical background: Digital transformation is an irreversible process today. Data produced by people, things, administration units and business organizations can be the source of valuable information. That transformation causes new possibilities for fast development, but also creates challenges for education processes and professional work. Furthermore, the digital transformation resulted in creating new professions like data science (DS). Because of data volume and its importance DS professionals became one of the most wanted specialists in the 21st century, and therefore many universities try to launch new study programs related to automated data processing and try to get the attention of potential students. Design/methodology/approach: The process was supported with analytics tools Hotjar and Google Analytics. The results presented in the paper base on the analysis of 974 pageviews recorded by Hotjar and activity of 824 page users reported by Google Analytics. Findings: The analysis showed that web analytics tools are very easy to use in the recruitment process, and that gathered data allows for better understanding of candidates' needs and improving the future requirement processes and tools. Results indicated that the most important topics for candidates were study programme and payment. Form the technical point of view the responsiveness of applications used for the recruitment process is crucial because a lot of traffic was generated by both users of desktop computers and mobile devices. The greatest interest in the program was recorded before the holiday months. Originality/value: The research contributes to academia in the field of recruitment. Paper presents the data science high study programme and indicates the importance of web analytics tools in the university recruitment process.
... The fundamental goal of web analytics is to collect and analyze data related to web traffic and usage patterns. The data mainly comes from four sources [8]  Direct HTTP request data: directly comes from HTTP request messages (HTTP request headers).  Network level and server generated data associated with HTTP requests: not part of an HTTP request, but it is required for successful request transmissions -for example, IP address of a requester. ...
Full-text available
Packet analysis is a primary trace back technique in network forensics, Packet analysis, often referred to as packet sniffing or protocol analysis, describes the process of capturing and interpreting live data as it flows across a network in order to better understand what is happening on that network. This can be used to find traces of nefarious online behavior, data breaches, unauthorized website access, malware infection, and intrusion attempts, and to reconstruct image files, documents, email attachments, etc. sent over the network .Packet analysis is typically performed using a packet sniffer, a tool used to capture raw network data going across the wire. Wireshark proves to be an effective open source tool in the study of network packets and their behavior. In this regard, Wireshark can be used in identifying and categorizing various types of attack signatures. It lets administrator to see what"s happening on network at a microscopic level. The purpose of this paper is to demonstrate how Wireshark is applied in network protocol diagnosis and can be used to find some basic indicators of compromise for a malware.
Full-text available
Cílem tohoto článku je představit návrh, jak by připravované nařízení Evropského parlamentu a Rady o respektování soukromého života a ochraně osobních údajů v elektronických komunikacích (tzv. nařízení ePrivacy) mělo upravovat použití cookies a podobných technologií. Současná směrnice 2002/58/ES řeší rizika spojená s použitím cookies a podobných technologií především požadavkem na informovaný souhlas uživatele. Toto řešení však klade nepřiměřený důraz na kontrolu ze strany uživatele, kterou však není v možnostech uživatele při běžném používání internetu efektivně vykonávat. Výsledkem je tak snížená úroveň ochrany soukromí uživatele před sledováním a současně komplikace pro stránky nabízející bezplatný obsah, financovaný pomocí cílené reklamy. Článek proto popisuje, jak fungují cookies a podobné technologie, co přinese blokace tzv. cookies třetích stran v nejrozšířenějších prohlížečích, jaká je historie právní úpravy soukromí v elektronických komunikacích, jaká je platná právní úprava použití cookies a podobných technologií a jak se tato úprava vyvíjela v různých verzích návrhu nařízení ePrivacy. Následně představuje návrh, jak by podle použití cookies a podobných technologií mělo být v nařízení ePrivacy upraveno de lege ferenda.
Many researchers have been interested in talking about the importance of big data analytics and how it can help companies improve their customer relationship management, especially in light of recent changes to the business environment, the opening up of markets to one another, and the rapid advancements in information and communications technology, because the customer is one of the most profitable assets for institutions that are good and valuable in managing their relationship with him, especially its ability to reach the largest possible number of his preferences and consumer needs, and thus techniques have evolved to manage the relationship with the customer, especially with the use of web analytics, which facilitates this task. E-CRM (electronic customer relationship management) is required for the company’s customer relationship management. We tried to include a very important case study in this study, embodied in the “Jumia” store in Algeria. Web analytics have a direct and positive impact on E-CRM, according to the findings of the study. When it comes to keeping its current customers happy, “Jumia” store in Algeria is relying solely on word-of-mouth advertising.KeywordWeb analyticsCRME-CRM
By the rising of information technology tools, all business companies have their own website, but most of them only use it as advertisement tools or news tools. However, web analytics are designed to change that and help entrepreneurs in gathering useful data from their websites. The aim of this research is to investigate the concept of web analytics in Algeria. This chapter shows that data analytics is extremely important to the Algerian companies because it helps them to optimize their marketing campaigns. When it comes to measuring the success of a marketing campaign and determining which campaigns are most effective, Google provides an easy-to-use tool that generates a unique tracking code (URL) for any link to a website.KeywordsWeb analyticsAlgerian companiesWeb analytics tools
The digital transformation generates huge amounts of heterogeneous data across the industrial value chain, from simulation data in engineering, over sensor data in manufacturing to telemetry data on product use. Extracting insights from these data constitutes a critical success factor for industrial enterprises, e. g., to optimize processes and enhance product features. This is referred to as industrial analytics, i. e., data analytics for industrial value creation. Industrial analytics is an interdisciplinary subject area between data science and industrial engineering and is at the core of Industry 4.0. Yet, existing literature on industrial analytics is fragmented and specialized. To address this issue, this paper presents a holistic overview of the field of industrial analytics integrating both current research as well as industry experiences on real-world industrial analytics projects. We define key terms, describe typical use cases and discuss characteristics of industrial analytics. Moreover, we present a conceptual framework for industrial analytics that structures essential elements, e. g., data platforms and data roles. Finally, we conclude and highlight future research directions.
Full-text available
Web Analytics has become a critical component of many business decisions. With an ever growing number of transactions happening through web interfaces, the abil-ity to understand and introspect web site activity is criti-cal. In this paper, we describe the importance and intri-cacies of summarization for analytics and report genera-tion on web log data. We specifically elaborate on how summarization is exposed in Splunk and discuss analyt-ics search design trade-offs.
Research and Analysis from WebAnalyticsDemystified The Web Analytics Thought Leaders w w w .w eb an alytic sd emystif ied .c o m EXECUTIVE SUMMARY Without a doubt, "engagement" has been one of the hottest buzzwords in digital advertising and marketing in the past 18 months. Forrester Research has written about it, companies founded to measure it, and countless arguments spawned just seeking a reasonable working definition of the term to apply in a meaningful way to the online channel. Unfortunately, despite the intense level of interest in the subject, few real gains have been made towards developi ng a practical and useful measure of engagement that can be applied to billions of dollars of advertising, marketing, and technology investments made annually on the Internet. While solutions exist—notably the Evolution Technology™ developed by this document's co-author Mr. Joseph Carrabis—most are relatively unknown and some are not easily integrated with the widely deployed digital measurement solutions in the marketplace today. Until now.
The results of an extensive investigation of cookie deployment amongst 100,000 Internet sites are presented. Cookie deployment is found to be approaching universal levels and hence there exists an associated need for relevant Web and software engineering processes, specifically testing strategies which actively consider cookies. The semi-automated investigation demonstrates that over two-thirds of the sites studied deploy cookies. The investigation specifically examines the use of first-party, third-party, sessional, and persistent cookies within Web-based applications, identifying the presence of a P3P policy and dynamic Web technologies as major predictors of cookie usage. The results are juxtaposed with the lack of testing strategies present in the literature. A number of real-world examples, including two case studies are presented, further accentuating the need for comprehensive testing strategies for Web-based applications. The use of antirandom test case generation is explored with respect to the testing issues discussed. Finally, a number of seeding vectors are presented, providing a basis for testing cookies within Web-based applications.
Web usage mining is the application of data mining techniques to discover usage patterns and behaviors from web data (clickstream, purchase information, customer information, etc.) in order to understand and serve e-commerce customers better and improve the online business. In this article, we present a general data warehouse/online analytic processing (OLAP) framework for web usage mining and business intelligence reporting. When we integrate the web data warehouse construction, data mining, and OLAP into the e-commerce system, this tight integration dramatically reduces the time and effort for web usage mining, business intelligence reporting, and mining deployment. Our data warehouse/OLAP framework consists of four phases: data capture, webhouse construction (clickstream marts), pattern discovery and cube construction, and pattern evaluation and deployment. We discuss data transformation operations for web usage mining and business reporting in clickstream, session, and customer levels; describe the problems and challenging issues in each phase in detail; provide plausible solutions to the issues; and demonstrate the framework with some examples from some real web sites. Our data warehouse/OLAP framework has been integrated into some commercial e-commerce systems. We believe this data warehouse/OLAP framework would be very useful for developing any real-world web usage mining and business intelligence reporting systems. © 2004 Wiley Periodicals, Inc.
Web Analytics Definitions -Version 4.0. Retrieved from http
  • J Burby
  • A Brown
Burby, J., & Brown, A. (2007, August 16). Web Analytics Definitions -Version 4.0. Retrieved from
Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity
  • A Kaushik
Kaushik, A. (2009). Web Analytics 2.0: The Art of Online Accountability and Science of Customer Centricity (1st ed.). Indianapolis, IN: John Wiley & Sons.
US Web Analytics Forecast
  • J Lovett
Lovett, J. (2009). US Web Analytics Forecast, 2008 To 2014. Cambridge, MA: Forrester Research.