Content uploaded by Nikos Korfiatis
Author content
All content in this area was uploaded by Nikos Korfiatis on Nov 09, 2019
Content may be subject to copyright.
1
Assessing Stationarity in Web Analytics: A study of
Bounce Rates
Marios Poulos
*
Faculty of Information Science and Informatics, Department of Archives and Library
Science, Ionian University, Ioannou Theotoki 72, 49100, Corfu, Greece.
Nikolaos Korfiatis
Norwich Business School, University of East Anglia. Elizabeth Fry Building, NR47J,
Norwich, United Kingdom
Sozon Papavlassopoulos
Faculty of Information Science and Informatics, Department of Archives and Library
Science, Ionian University, Ioannou Theotoki 72, 49100, Corfu, Greece.
This paper should be cited as:
Poulos, M., Korfiatis, N., & Papavlassopoulos, S. (2020). Assessing Stationarity in Web
Analytics: A study of Bounce Rates. Expert Systems (Forthcoming)
*
Corresponding Author, E-mail: mpoulos@ionio.gr; Ioannou Theotoki 72, 49100, Corfu, Greece
2
Abstract
Evidence-based methods for evaluating marketing interventions such as A/B testing have
become standard practice. However, the pitfalls associated with the misuse of this decision-
making instrument are not well understood by managers and analytics professionals. In this
study, we assess the impact of stationarity on the validity of samples from conditioned time
series, which are abundant in web metrics. Such a prominent metric is the bounce rate, which
is prevalent in assessing engagement with web content as well as the performance of marketing
touchpoints. In this study, we show how to control for stationarity using an algorithmic
transformation to calculate the optimum sampling period. This distance is based on a novel
stationary ergodic process that considers that a stationary series presents reversible symmetric
features and is calculated using a dynamic time warping (DTW) algorithm in a self-correlation
procedure. This study contributes to the expert and intelligent systems literature by
demonstrating a robust method for subsampling time series data, which are critical in decision
making.
Keywords: Time Series, Numerical Analysis, Stationary Progress, Bounce Rate, DTW
3
1. Introduction
The proliferation of analytical methods and tools has led to a critical paradigm shift in
managerial decision making, highlighting the importance of evidence-based evaluations of the
impact of interventions across a spectrum of business practices (e.g., marketing). As in other
areas, such as medicine, evidence-based methods in management practice (Marr, 2010; Pfeffer
& Sutton, 2006) seek to evaluate not only whether the effect of an intervention is observable
but also the reliability and validity of the results presented by the evaluation criterion used. The
expert and intelligent systems’ literature has necessitated the need for correct data input across
a variety of methods and application domains. A very prominent case (as regards to measurable
economic significance) is the evaluation of the impact of interventions in presentation/interface
elements in various marketing functions, such as e-commerce and display advertising. The
former has given rise to so-called “customer-driven” development (Edvardsson et al., 2012),
in which real customers (or users) evaluate features of a particular medium under realistic
marketing mix conditions.
Such a problem considers evaluating the performance of media use and consumption
in terms of easy-to-understand metrics to guide budget allocation (Danaher & Rust, 1996).
Considering a typical application scenario of an online retailer or an advertising agency,
performance metrics capture consumer engagement with the medium and its effectiveness in
attracting consumers’ attention. A very prominent metric, which is the focus of this study, is
the bounce rate, which is defined as the ratio of single-page user sessions to the total sessions
within a given time duration (Sculley et al., 2009). A high bounce rate can lead to a poor
retailer/advertiser return on investment (ROI) and suggests that users may have a poor
experience once they land on a particular page through a referral link (e.g., by clicking on an
ad or by finding the page through a web search). The former is commonly referred to as a
marketing touchpoint.
4
A typical way to address such a deficiency is to intervene in the interface elements to
find the combination that leads to the best performance metric (e.g., the lowest bounce or click-
through rate) and measure the effect of this intervention. This approach is known as A/B testing
and considers splitting the visitor traffic into two streams, which are assigned to the baseline
condition (B) or its alteration (A). When considering multiple alternations wherein the
comparison of the difference considers more than two groups, M/N or multivariate testing is
performed (Kohavi et al., 2009). The evaluation of the effect of these interventions is performed
through a typical test of the mean differences using either two-sample parametric tests or
ANOVA when considering various alternative interventions. This capability is integrated into
web analytics tools, which are prevalently used to guide decision making by analytics
professionals. With the ever-increasing dimensionality of the test features and attributes of
testing, expert input has become limited and biased (Sauter, 2014).
A typical example of such a bias is the decision concerning the duration of A/B tests
and whether enough statistical power has been accrued to declare a winner or best-performing
configuration. Deciding the length of a test is critical since, in the case of the worst performance
in the post-hoc period of such a test, opportunity costs arise from lost conversions. Seasonal
and cyclical variations of demand have been demonstrated to affect several aspects of economic
activity, with online shopping being no exception.
In this study, we aim to address a typical question that is abundant in this type of test,
which is “how long we should sample a session in order to extract results that capture an
adequate level of periodicity for an effect to be observed?” An affirmative and non-biased
answer would allow analytics professionals (e.g., those active in search engine optimization)
to safely evaluate the economic significance of their intervention, avoiding Type I and Type II
errors that typically accompany such undertakings and may result from inadequate sampling.
5
Such a challenge, while approachable by a set of standard statistical practices, has the
characteristic of considering the evaluation of a metric that is of a longitudinal rather than a
cross-sectional nature. The former assumes that the time series of the evaluation criterion used
in a typical A/B testing scenario corresponds to the aggregated metrics of an entire source and
is free of any precondition, and the condition of the time series is inherent in its data structure
(e.g., the way the metric is calculated). In this case, the time series is also referred to as a
conditioned time series (Hamilton, 1994). In some embodiments, a source contains a number
of conditioned time series, such as metrics, including visits, page views, bounce rates,
pages/visits, new visits, average time on site, etc. (Vaughan & Yang, 2013).
Considering that such time-series data are relatively large or high-frequency,
approaches related to sampling and periodicity pose a challenge to standard analytics tools
(Varian, 2014). From a statistical viewpoint, the problem that we are looking to address here
is more specifically discussed in the work of Downing, Fedorov, Lawkins, Morris, and
Ostrouchov (2000). Because of size, there is the assumption that the dataset cannot be analysed
at once and should be analysed in segments. The strategy adopted in our study considers the
segmentation of a large data series into a series of segments of arbitrary length and then an
examination of one part of the division at a time to allow unequal segments to reach an optimal
segment length. In this way, the variation of the stationarity per period is investigated to
ascertain whether there is a stable periodical pattern of this variation, which in turn, can be a
guiding heuristic of sample size. Building on previous work (Poulos, 2016), our methodology
provides a simple but robust approach to dealing with the segmentation and periodicity
estimation of time series data representing conditioned metrics. Considering that such metrics
are abundant in web analytics and marketing practices, our work also has practical implications.
Our study responds to several points of interest already outlined in the literature, such
as that of Mortenson, Doherty, and Robinson (2015), regarding the integration of operational
6
and computational intelligence methods with the emerging field of data analytics and in
particular high-frequency data from digital trails of customer activity. From the perspective
that sampling periods can alter the significance of marketing interventions, such as those
measured in A/B or M/N testing scenarios, our paper also contributes to the practice of web
analytics by incorporating research with real-world data captured through analytics tools that
are considered standard in the industry (Google Analytics). To this end, this paper is structured
as follows. Section 2 discusses related work and the background of the bounce rate definition
and the use of A/B and M/N testing methodology in evaluating the significance of marketing
interventions. We provide an analytical formulation and explanation of the algorithmic process
in Section 3, where the problem of identifying the optimal sampling period for a conditioned
time series is discussed. A benchmark evaluation using data from an online retailer is discussed
in Section 4, along with implications for practice in Section (5). The paper concludes with
Section 6, discussing limitations and future research directions.
2. Related work
2.1 Bounce rates
Bounce rates represent a significant benchmark for the assessment of the engagement value of
interactions—so-called touchpoints—in various areas of content authoring and advertising
(Murthy & Mantrala, 2005). In their simplest form, bounce rates can be defined as the ratio of
extremely short-lived sessions (generally defined as single-page sessions) established either by
direct entry (when the user types the URL into the browser) or by referral entry (by clicking on
a hyperlink) and its correspondent landing. Several established industry tools, such as Google
Analytics (Clifton, 2012; Plaza, 2011), define bounce rates as sessions in which either
immediate back-button clicks have been initiated once the user loads the page or as abandoned
clickstreams in which no further action has been taken after the user initiates a session.
7
Considering the universe of n sessions initiated on a display space (e.g., website,
banner, etc.) with each session corresponding to an event time clickstream of k length:
St = ti=1..., ti=k (1)
the bounce rate (BR) is defined as the ratio of sessions in which the depth of the clickstream
is singular to the overall number of sessions, such as:
(2)
Due to its simplicity, the bounce rate has been a standard benchmark for evaluation of
the performance of entry points (or referrals) in web analytics. In the case of display or
sponsored search advertising, bounce rates can be used to measure the performance of an ad
and provide input for decision making in advertising budget allocation (Jeziorski & Moorthy,
2017). For example, if a landing page (the part of the website to which the click-through action
leads) has a bounce rate of 80%, this suggests that only 20% of the users that clicked on the ad
or sponsored search result were engaged with the action encapsulated in the landing page.
Considering that click-through rates are linearly dependent on the cost per click (which, in the
case of sponsored search results, varies and is the result of an auction), then an 80%
abandonment of the landing page corresponds to a significant loss of the investment provided
in the advertising budget.
Nevertheless, while optimizing bounce rates is an obvious approach, several
practitioners consider high percentages to be the results of induced demands that can be driven
by other factors and not necessarily by user attention (e.g., accidental landings, technical errors,
user interruptions, etc.). Industry reports suggest that an average bounce rate of 40% is nominal
for particular sectors (e.g., retailing), and as such, more resources should be directed toward
the optimization of user trajectories regarding k ≥ 2 actions in the clickstream (eCommerce
Europe, 2016). Furthermore, due to its inherent behavioural nature, the bounce rate depends on
the targeting that the ad initiates. Entries initiated through sponsored search advertising (e.g.,
8
Google Adwords) tend to have lower bounce rates than do entries initiated through display
advertisements (e.g., banners) due to the inherent information targeting that the advertising
mechanism uses (Yang & Ghose, 2010).
In the academic literature, researchers have associated increased bounce rates with the
engaging nature of the informational content contained in the website or the visual attributes
of the content (Lindgaard, Fernandes, Dudek, & Brown, 2006), including audio features (e.g.,
in the case of disruption). However, our understanding of bounce rate characteristics and
whether they can be predicted is somewhat limited (Wells, Valacich, & Hess, 2011), and
content optimization techniques, such as A/B testing, have become prevalent as standard tools
in the industry.
2.2 A/B testing and sample size
In its simplest form, an A/B test is a randomized controlled experiment technique that involves
the experimental evaluation of an overall evaluation criterion OEC (e.g., the performance of
an alteration of a web page) against a baseline. From an analytical point of view, it considers a
hypothesis test of two samples, with the null hypothesis corresponding to the baseline variant,
resembling a between-subjects design from an experimental point of view. It has been adopted
by content designers and marketing analysts for the evaluation of different stages of the
purchase funnel in e-commerce scenarios (Hoban & Bucklin, 2015). Typically, content
designers select a feature that has a level of uncertainty regarding its effect on a performance
metric (e.g., bounce rates, click-through rates, etc.). Then, a new page is created (Version B),
and a visitor is randomly assigned to either page A (or the baseline), which is the unaltered
version of the website, or page B, which represents the altered version of the page. The subject
assignment procedure is performed through a randomized mechanism (a so-called splitter),
which is typically executed on a server using a cookie assignment to the visitor. This procedure
9
is performed to ensure that for the duration of the experiment, repeat visits are assigned to the
same version of the page.
Since the evaluation of the altered version against the baseline is performed with a
parametric test, assumptions of normality are followed for all parameters of the problem,
including confidence intervals and statistical powers. For several categories of web analytics
metrics, for which the underlying distribution is not normal (e.g., Gaussian or Poisson),
appropriate non-parametric tests are used. For example, if we consider the evaluation of the
effect of an intervention on click-through rates, which has been shown to follow a binomial
distribution (REF), Fischer’s exact test is used, while non-parametric tests, such as the Mann-
Whitney U-test, are dominant when no assumptions about the underlying distribution are made.
The standard guiding principle behind the reliability of the test is the statistical significance of
the difference between the sample means and the appropriate statistical power that the
difference in the selected metric is going to exhibit. Several researchers in the literature have
studied the issue from a statistics point of view, and the probability perspective (Brodersen,
Gallusser, Koehler, Remy, & Scott, 2015; Varian, 2016) and alternative corrections and criteria
have been proposed and adopted from the experimental literature. For example, Gibbs
sampling may be appropriate for the selection of sample intervals for A/B testing if no direct
data are available about the probability distribution of the chosen OEC.
Regardless of the evaluation approach, questions regarding the optimal sampling size
and length are still debatable and subject to the sensitivity of the selected test, and the
assumptions regarding the underlying distribution. Our aim in this study is not to delve into the
mechanism used to compare the differences between the two samples but to direct our attention
toward the issue of sub-sample selection to evaluate the OEC in the context of A/B testing.
This issue is directly related to the question of the experimental duration and its time series
specific nature. Building on prior work concerning time series stationarity detection (Poulos,
10
2016), our approach considers the extraction of the stationarity degree to guarantee equal
likelihoods of activity captured by the OEC across the testing sample.
2.3 Our contribution
The problem that we tackle with is that the underlying assumption of the random assignment
achieved with a split generator in an A/B testing scenario may not be enough to safeguard the
validity of the test result, and as such, a more robust approach based on the time series
characteristics of the targeted metric is needed.
This problem is of high economic significance for users of an advertising network and,
in particular, retailers, since it is costly at two levels. First, the direct advertising cost involves
the cost-per-click (CPC) associated with a bounced visit, and second and most importantly, lost
opportunity results from missed activity of a potential client. Arguably, the problem of
assessing the usability performance of a web space (e.g., an e-commerce site) considers not
only the bounce rate but also the overall trackable activity until the point of checkout (and
hence other elements of the purchase funnel, which can lead to an abandonment of the
clickstream). However, concerning the question of decisions related to budget allocation (e.g.,
for sponsored-search or display advertising), the returns of these decisions may be harmful if
the optimization strategy does not consider an accurate estimation of the time dependence.
Inherent sources of error in this case, such as stationarity, have been known to influence the
reliability of time-dependent metrics (Sculley et al., 2011), and our intention in this study is to
address this issue by introducing an analytical process.
The method is based on an algorithm [Poulos, M. (2016).] that detects the sampling
stability of a time series. The sampling stability is expressed by the discovery of some dominant
periodicity extracted from the change of the stationarity degree within a particular time series
11
segment. Therefore, the algorithmic contribution of the study could be applied beyond the
bounce rate issue. The details of this contribution are discussed in section 5.1.
3. Analytic formulation
2.4 Preliminaries
The extraction of the stationarity degree is based on previous work (Poulos, 2016;
Sharifdoost, Mahmoodi, & Pasha, 2009), in which it has been defined that a discrete time
stationary process {Mn} with i = 1,...: n, is time reversible for every positive integer n if the
following equation is satisfied:
(M1, M2,...,Mn) = (Mn,Mn−1,...,M1) (3)
Then, it is considered that a discrete time series with i = 0,...,n produces a
mirror time series, which can be described as:
(4)
Thus, taking into account Equation 4, the degree of stationarity is based on in the
following formulation:
(5)
If error = 0, then the time series consists of a stationary process based on the error
estimation of the dissimilarity measure between the discrete time series and the reversible
. Then, using Euclidean and dynamic time warping (DTW) techniques, the local
dissimilarity of the function f is defined between any pair of elements Mn ∧ Nn, with the
shortcut:
( ) ( )
,1 , , 0
n
ii
ij d i j f M N
==
(6)
12
Then, if the path is the lowest cost path between two series, the corresponding
dynamic time warping (DTW) technique (Salvador & Chan, 2007) provides the warping
curve φ(k), ∀k = 1,2,…,T as:
( ) ( ) ( )
( )
( ) ( )
,
1,2,...,
^
y
y
k k k with
kk n
=
(7)
The warping functions ϕx(k)∧φψ(k) remap the time indices of M ∧N accordingly. Given
ϕ and following Cortez, Rio, Rocha, and Sousa (2012), the average accumulated distortion
between the warped time series M ∧ N is calculated as follows:
( ) ( ) ( )
( )
( )
1
,
,xx
d k k m
d M N M
=
=
(8)
where mϕ(k) is a per-step weighting coefficient of the corresponding normalization constant
(mϕ), which confirms that the accumulated distortions are comparable along different paths.
To ensure reasonable warps, constraints are usually imposed on ϕ. The basic idea
underlying DTW is to find the optimal alignment ϕ such that:
( ) ( )
, min ,
n n n n
D M N d M N
=
(9)
Therefore, one picks the distortion of the time axes of M ∧N, which brings a couple of
the time series as near to each other as possible.
2.5 Procedural definition
Graphically, this algorithm is described in Figure 1. We provide a more detailed analytical
overview and the algorithmic steps below.
[Insert Figure 1 here]
13
Step1. Let us consider the matrix M, which contains the hourly bounce rate data set with the
size (1 × R), R ∈ N
( )
1,
i
jM j x M
=
, where
( ) ( )
1, : ,0
i
jM j x M x j x j i x R i
== + + + −
(10)
Index j corresponds to the number of repetitions of the algorithm in the same window
length each time, with a unit step of sliding. Additionally, the indicator i is the selected size of
the investigated window, which is constant for each experiment, and x is the beginning point
of the series. Then, the corresponding mirror data set is:
( ) ( )
,1 , : ,0
i
ij N j x N x j i x j x R i
== + + + −
(11)
Subsequently, the extraction of the stationarity value according to Equation 7 is
depicted in the following square matrix
( ) ( )
( )
1,1 1,2 1,i 1 1,
2,1 2,2 2, 1 2,
1
1,1 1,2 1, 1 1,
,1 ,2 , 1 ,
..
..
......
, , , ......
..
..
i
ii
i
j
j j j i j i
j j j i j i
Z Z Z Z
Z Z Z Z
Z M j x N j x
Z Z Z Z
Z Z Z Z
−
−
=
− − − − −
−
=
(12)
Step2. Then, the matrix
( ) ( )
( )
11, , ,
i
i
i
ij
A Z M j x N j x
==
=
is produced, along with a second
matrix using the same procedure
( ) ( )
( )
11, , ,
n
i
ij
B Z M j y N j y
==
=
(13)
where 0 < y < R − n is produced to construct a correlated pair of matrices.
Step3. Thereafter, aiming to produce a smoothing procedure in the data of matrices ]
and ii=1 [Bi], a cumulative moving average (CMA) procedure is submitted as follows:
1
1,1
1
i
nn
in
nn
A CMA
CMA CMA n i
n
−
=
+
−
= +
+
(14)
and
14
1
1,1
1
i
nn
in
nn
B CMB
CMB CMB n i
n
−
=
+
−
= +
+
(15)
Then, the new matrices are:
21
1
11 1 2 ( 1) ( 3)
, , , ,..., , ,
n n n k k
i
i i i i i k
ii i i i k n i k n
MA A A A A A A A
−+
== = = = − − = − −
=
Step4. Then,
10
pi
cF MA
=
==
and
10
pi
cG MB
=
==
are calculated to extract the local
maxima points of the graphs corresponding to the matrices [MA] and [MB].
Step5. Consequently, the differences between adjacent elements of the [F] and [G] matrices
are calculated, i.e.,
11
1 1 2 1
11
11
pp
pp
c c c c
cc
cc
T F F T G G
−−
−−
==
==
= − = −
(16)
Step6. Then, the mean values of the matrices
12
TT
are determined.
Step7. Then, the matrix
12
,W T T
=
is determined, and the standard error of the mean of
the matrix [M] is calculated as follows:
( )
22
121
2
i
i
error
WW
s
=
−
−
=
(17)
Step8. Finally, using a two-tailed t-test with df = 1 for
W
, the below equation is obtained:
_lim *
_lim *
error value
error value
lower it W s t ll W ul
upper it W s t
=−
=+
(18)
where ll and ul are the lower and upper limits, respectively.
15
4. Experimental part
2.6 Data and methods
The experiment considered a dataset sourced from an online retailer active in the segment of
consumer electronics
1
. The retailer’s objective was to evaluate the performance of a search
engine optimization intervention that was carried out to improve the overall bounce rate that
the e-shop exhibits when visitors land on the website by clicking on an organic (non-sponsored)
search result through Google or secondary search providers.
We gained access to the retailer’s Google Analytics account and extracted data from
the main landing page, which listed entry points for the different categories (e.g., digital
cameras, laptops, etc.). Hourly data were obtained using the API provided by the Google
Analytics backend and exported to CSV files for further processing. The resulting input data
matrix corresponded to the click-stream for an approximate two-year period and had a sample
size n=18288 visitor sessions. During this period, the retailer’s website remained unchanged
concerning visual cues and interface characteristics. We used the default computation for the
bounce rates from Google Analytics and performed some preliminary analysis to ensure that
during the period to be analysed, there was no technical failure (downtime) of the website that
would interrupt the continuity of the time series. The graphical representation of the variation
of the bounced sessions in our dataset is shown in Figure 2.
[Insert Figure 2 here]
Having acquired the data and prepared the input data series, we proceed with the
implementation of the analytic procedure as described in Section 3.2. For clarity, we refer to
the points of the time series by their index value, which is set from 1 to the maximum length
of the data matrix (n = 18288). We outline the numerical computation of the steps that we used
in the sections that follows.
16
[Insert Figure 3 here]
Step 1. For a random value x = 11813 with i = 60 and taking j = 1,2,3...60, a matrix
M(60,11813) is constructed according to Equation 8 (see the blue line in Figure 3 and
Table 1). In the same way, the mirror N of matrix M is produced:
( ) ( )
,1 , 1: ,0
i
ij N j x N x j x j x R i
== + + + −
according to Equation 9 (see the red line in Figure 3 and Table 1). Then, the calculated
degree of stationarity D11 (see Equation 10) is computed using the dataset D11 = 0.0209.
Similarly, the other values of the matrix with its the corresponding dimensions (60 × 60) are
obtained.
Step 2 Thereafter, the matrix
( ) ( )
( )
11, , ,
i
i
i
ij
A D M j x N j x
==
=
of size (1x60) is obtained. In
the same way, the matrix
( ) ( )
( )
11, , ,
n
i
i
ij
B D M j y N j y
==
=
of size (1x60) is
obtained.
Step 3. Using a cumulative procedure with a 5-point (n=5) moving average, the matrices
1i
iMA
=
and
1i
iMB
=
are obtained.
Step 4. According to Equation 11, the local maxima points of the graphs of the matrices [MA]
and [MB] are calculated in the new matrices [F] ∧ [G] (see Table 1).
Step 5. Consequently, the differences between the adjacent elements of matrices [F] and [G]
are calculated in the new matrices [T1] ∧ [T2].
Step6. Then, the mean values of the matrices
12
TT
are determined (see Table 1,
column: Mean).
Step7. Then, the matrix
12
,W T T
=
is determined, and the standard error of the mean of
the matrix is calculated (see Table 1, column: Error).
17
Step8. According to Equation 14, for a 98% confidence interval with t=31.820, a p-value of
0.02 for 2% significance and W¯ = 25.2767 (see Table 1, cell: Mean Error) which is
transformed as follows:
_lim 25.2767 0.0222*31.821 24.5703 25.2767 25.9831
_lim 25.2767 0.0222*31.821
lower it
upper it
=−
=+
(19)
2.7 Results
According to the experimental procedure and taking into account the results presented in Table
1 and the resulting transformation of the time series depicted in Figure 4, an apparent
periodicity of the applied processing is observed. This observation is focused on the measure
of the differentiated positions of the local maxima points and puts great emphasis on the
dominant query, by subjecting the task on finding the necessary sample size that significantly
captures the observed periodicity of the time series.
[Insert Table 1 here]
The results of the experimental procedure (step 8), provide a sample size s=25, which
can be interpreted as that the variation of the stationarity degree has stable maxima periodically
for a set of 25 data points bounce rate samples. Considering that our data represents hourly
bounce rates, the benchmark data suggest a window of 25 hours for the evaluation of
interventions for content optimization.
[Insert Figure 4 here]
In more detail, the above calculations are achieved via Equations 4-18 using the mean
difference between the matched pairs technique. The matched data pairs were obtained in a
random way according to Equation 13 and had a scalable range from 60-200 with the step
increment of 5, that is, 30 matched data pairs were created in total (see Table 1). Therefore,
for each matched pair—for example, the values at length 60, which are depicted between the
18
data x=[11813, 11873] and y=[5391, 5451]—the local peaks are calculated (see Figure 1, start).
Then, as the mean matched data pairs are determined, the mean value of the distance between
each local peak value is obtained. In the start case, the number of peaks for the pair x and y
data set is two (2), and the mean distances are 28 and 29, respectively. Therefore, the
M.D.B.M.P is calculated (see Equation 17) from the difference between the above means,
which, in Table 1, is depicted as the error (e=0.5). In the same way, the M.D.B.M.P results are
calculated through the last data set (see Figure 1, end). Additionally, in the Appendix, the
graphical transformations of the above calculations are depicted for the 30 matched data pairs.
5. Discussion
2.8 Theoretical Implications
This study contributes to the expert and intelligent systems literature by demonstrating a robust
method for subsampling time series data, which are critical in decision making. In particular
the application of the stationarity detection algorithm as demonstrated in previous work
(Poulos, 2016) allows for evaluating more complex problems in business practice such as
measuring website prominence (Papavlasopoulos, 2019; Poulos, Papavlasopoulos,
Kostagiolas, & Kapidakis, 2017) as well as dimensionality reduction in text analytics (Poulos,
2017).
The particular implication in researching patterns in high-frequency time series data
can also be applied in patterns of web queries such as those in Google Trends. This extraction
of the periodical non-stationarity features of time series can complement existing approaches
for novelty detection in scientific literature utilizing the patterns on prominent keywords
appearing in scientific publications (Papavlasopoulos, 2019). While the application in this
context concerns consumer activity it confirms previous results that proxy a visitor’s activity
using the search queries and the related keywords that have been utilized. This method is
19
implemented via the same algorithm, with the only exception being that parameter M is fed
with a multidimensional data structure (see Equation 10). While this study aims to assess when
periodicity can distort the outcomes of a marketing intervention, it confirms similar results with
the study of Papavlasopoulos (2019) which investigates when a keyword time series gives non-
stationarity peaks. As such, asserting the condition of a non-stationary categorical time series,
yields goodness of fit in the prediction issue.
A further implication that can be investigated further in future studies comes from the
aggregation of individual time series using grouping factors such as product category and
brand.Poulos et al., (2017) demonstrated that asserting stationarity of aggregated time series of
search keywords using Google Trends can be achieved, using an example of publishing houses
and their corresponding publications. In a similar manner, the data type for parameter M
(Equation 10) needs to modified to represent 2-dimensional groupings.
The algorithmic process presented here can also be used in the context of text analytics
(Poulos, 2017), where the possible relationship between the syntactic property of a text sample
and the stationary variation of the time series that produces the text, can be asserted. This can
inform additional dimensions, such as the case of recommendations based on semi-structured
data such as those on online reviews (Korfiatis and Poulos, 2013).
Therefore, application of Equations 1-13 to the data type (M) yields the new modified
time series A and B (see Equation 12 and 13), which in turn, leads us to the technique for
calculating the periodicity of various type of high-frequency data as the ones described above
(see Equations 14-18).
2.9 Implications for practice
Trust in evidence-based methods for evaluating marketing interventions, such as A/B testing,
is gaining momentum for both managers and marketers. However, the pitfalls associated with
misuse of this decision-making instrument are not well understood by managers and analytics
20
experts since the prevalence of software tools provides an out-of-the-box solution, which may
not be optimal (Dmitriev, Frasca, Gupta, Kohavi, & Vaz, 2016). Anecdotal examples of
negative results induced by Type I and II errors are known in the industry, and careful
consideration of the time-dependent properties of marketing metrics (e.g., stationarity) by
decision-makers is important. Making a healthy choice between alternative interventions
guided by customer-driven interactions is an important example of analytical maturity
(Davenport & Harris, 2007) and is independent of the organizational size. Several examples of
A/B testing scenarios consider interventions on web spaces owned by small- and medium-sized
companies. As such, being able to reliably ascertain the impact of these interventions on
conditioned time series can also give a competitive advantage in capturing consumer attention.
However, as Kohavi et al. (2012) state, experimentation is not a panacea for everyone, and its
assumptions should be well understood when interpreting results of high economic
significance. In this study, an experimental method is attempted to override the aforementioned
unsafe decision assumptions of the A/B method. To achieve the above objective, the data were
applied to the algorithm in Equations 4-18 using matched data pairs as described in section 4.2,
which yields a statistical significance test. In the analysis of the results, a 25-hour sample time
period has been derived for the data set. Furthermore, this study places a large emphasis on
examining the nature of the time series and the stage from which the data are retrieved. Practical
considerations such as the assumptions that accompany the time series data retrieved at the
initial stage, or the impacts of any demand peaks (e.g., due to marketing campaigns running in
parallel) can be validated through the procedure outlined here.
As discussed in the previous section, data sets from Google Trends and bounce rate
could yield this degree of periodicity. This consideration is based on the assumption that the
nature of the data is depicted in the local peaks of the transformed time series that come from
the stationarity process.
21
6. Conclusions, Limitations and Future Research
In this paper, the potential to extract periodical stationarity exhibited in a conditioned time
series of bounce rates was investigated and evaluated using a benchmark dataset. Controlling
for stationarity is a significant problem in analytics and forecasting, in which a time series is
analysed for the levels of differences. Using the appropriate transformations with a new
algorithm for calculating the stationary distance, our approach can be useful in the evaluation
of marketing interventions, such as those in A/B testing scenarios. This distance is based on a
novelty stationary ergodic process, which rests on the consideration that the stationary series
presents reversible symmetric features and is calculated using the dynamic time warping
(DTW) algorithm in a self-correlation procedure. The results of the benchmark test performed
in the experimental part of this paper present the very clear and logical periodicity of the
discussed method by utilizing the measures of differences in the positions of local maxima
points during the segmentation of the conditioned series.
While our approach was operationalized for a conditioned time series, our method does
not take into account causal influences from other time-dependent processes that may affect
the behaviour of the evaluated metric (in our case, bounce rates) or psychological cases related
with shopping cart abandonment (Huang et al., 2018). Such a case could arise when transitions
from stages are considered (e.g., bounces after the second click). In this aspect, our analysis is
therefore agnostic to important user characteristics, such as repeated visits and view-through
conversions, which require a higher-order data structure than that considered in this study.
In addition, our analysis places a large emphasis on the issue of finding the necessary
sample size that significantly satisfies the observed periodicity of a time series of bounce rates
in an e-commerce scenario. Future work on other types of conditioned time series represented
in web analytics, such as page views, pages/visits, percentages of new visits, and average times
22
on sites, is also important as well as demand patterns in supply chains (Zissis et al., 2015). This
work will involve studying more sophisticated time series processing and template matching
techniques as well understanding the distributional characteristics of these metrics.
Data accessibility
No data are provided together with the manuscript
Notes
*
For this study, the retailer has requested to remain anonymous.
23
References
Brodersen, K. H., Gallusser, F., Koehler, J., Remy, N., & Scott, S. L. (2015). Inferring causal
impact using Bayesian structural time-series models. The Annals of Applied Statistics, 9,
247–274. doi:10.1214/14-aoas788
Clifton, B. (2012). Advanced web metrics with Google analytics. Indianapolis, Indiana: John
Wiley & Sons.
Cortez, P., Rio, M., Rocha, M., & Sousa, P. (2012). Multi-scale internet traffic forecasting using
neural networks and time series methods. Expert Systems, 29, 143–155.
doi:10.1111/j.1468-0394.2010.00568.x
Danaher, P. J., & Rust, R. T. (1996). Determining the optimal return on investment for an
advertising campaign. European Journal of Operational Research, 95, 511–521.
doi:10.1016/0377-2217(95)00319-3
Davenport, T. H., & Harris, J. G. (2007). Competing on analytics: The new science of winning.
Boston, MA: Harvard Business Press.
Dmitriev, P., Frasca, B., Gupta, S., Kohavi, R., & Vaz, G. (2016). Pitfalls of long-term online
controlled experiments. In 2016 IEEE international conference on big data (big data)
(pp. 1367–1376). Washington, DC, USA: IEEE.
Downing, D. J., Fedorov, V. V., Lawkins, W. F., Morris, M. D., & Ostrouchov, G. (2000). Large
data series: Modeling the usual to identify the unusual. Computational Statistics & Data
Analysis, 32, 245–258. doi:10.1016/s0167-9473(99)00079-1
eCommerce Europe. (2016). E-commerce benchmark and retail report. Retrieved from
https://www.ecommerce-europe.eu/app/uploads/2016/06/Ecommerce-Benchmark-Retail-
Report-2016.pdf
24
Edvardsson, B., Kristensson, P., Magnusson, P., & Sundström, E. (2012). Customer integration
within service development—A review of methods and an analysis of insitu and exsitu
contributions. Technovation, 32, 419–429. doi:10.1016/j.technovation.2011.04.006
Hamilton, J. D. (1994). Time series analysis. Princeton, NJ: Princeton University Press.
Hoban, P. R., & Bucklin, R. E. (2015). Effects of internet display advertising in the purchase
funnel: Model-based insights from a randomized field experiment. Journal of Marketing
Research, 52, 375–393. doi:10.1509/jmr.13.0277
Huang, G. H., Korfiatis, N., & Chang, C. T. (2018). Mobile shopping cart abandonment: The
roles of conflicts, ambivalence, and hesitation. Journal of Business Research, 85, 165–
174.
Jeziorski, P., & Moorthy, S. (2017). Advertiser prominence effects in search advertising.
Management Science, 64, 1365–1383. doi:10.1287/mnsc.2016.2677
Kohavi, R., Deng, A., Frasca, B., Longbotham, R., Walker, T., & Xu, Y. (2012). Trustworthy
online controlled experiments: Five puzzling outcomes explained. In Proceedings of the
18th ACM SIGKDD international conference on knowledge discovery and data mining
(pp. 786–794), Beijing, China: ACM.
Kohavi, R., Longbotham, R., Sommerfield, D., & Henne, R. M. (2009). Controlled experiments
on the web: Survey and practical guide. Data Mining and Knowledge Discovery, 18(1),
140–181. doi:10.1007/s10618-008-0114-1
Korfiatis, N., Poulos, M. (2013). Using online consumer reviews as a source for demographic
recommendations: A case study using online travel reviews. Expert Systems with
Applications 40, 5507–5515.
25
Lindgaard, G., Fernandes, G., Dudek, C., & Brown, J. (2006). Attention web designers: You
have 50 milliseconds to make a good first impression! Behaviour & Information
Technology, 25(2), 115–126. doi:10.1080/01449290500330448
Marr, B. (2010). The intelligent company: Five steps to success with evidence-based
management. New York, NY: John Wiley & Sons.
Mortenson, M. J., Doherty, N. F., & Robinson, S. (2015). Operational research from Taylorism
to Terabytes: A research agenda for the analytics age. European Journal of Operational
Research, 241, 583–595. doi:10.1016/j.ejor.2014.08.029
Murthy, P., & Mantrala, M. K. (2005). Allocating a promotion budget between advertising and
sales contest prizes: An integrated marketing communications perspective. Marketing
Letters, 16(1), 19–35. doi:10.1007/s11002-005-1138-6
Papavlasopoulos, S. (2019). Scientometrics analysis in Google trends. Journal of Scientometric
Research, 8(1), 27–37. doi:10.5530/jscires.8.1.5
Pfeffer, J., & Sutton, R. I. (2006). Evidence-based management. Harvard Business Review,
84(1), 62.
Plaza, B. (2011). Google analytics for measuring website performance. Tourism Management,
32, 477–481. doi:10.1016/j.tourman.2010.03.015
Poulos, M. (2016). Determining the stationarity distance via a reversible stochastic process.
PLoS One, 11(10), e0164110. doi:10.1371/journal.pone.0164110
Poulos, M. (2017). Definition text's syntactic feature using stationarity control. In 2017 8th
International conference on information, intelligence, systems & applications (IISA) (pp.
1–5), Larnaca, Cyprus: IEEE.
26
Poulos, M., Papavlasopoulos, S., Kostagiolas, P., & Kapidakis, S. (2017). Prediction of the
popularity from Google trends using stationary control: The case of STM publishers. In
2017 Fourth international conference on mathematics and computers in sciences and in
industry (MCSI) (pp. 159–163), Corfu, Greece: IEEE.
Salvador, S., & Chan, P. (2007). Toward accurate dynamic time warping in linear time and
space. Intelligent Data Analysis, 11(5), 561–580. doi:10.3233/ida-2007-11508
Sauter, V. L. (2014). Decision support systems for business intelligence. Hoboken, NJ.: John
Wiley & Sons.
Sculley, D., Malkin, R. G., Basu, S., & Bayardo, R. J. (2009). Predicting bounce rates in
sponsored search advertisements. In Proceedings of the 15th ACM SIGKDD international
conference on knowledge discovery and data mining (pp. 1325–1334), Paris, France:
ACM.
Sculley, D., Otey, M. E., Pohl, M., Spitznagel, B., Hainsworth, J., & Zhou, Y. (2011). Detecting
adversarial advertisements in the wild. In Proceedings of the 17th ACM SIGKDD
international conference on knowledge discovery and data mining (pp. 274–282), San
Diego, California, USA: ACM.
Sharifdoost, M., Mahmoodi, S., & Pasha, E. (2009). A statistical test for time reversibility of
stationary finite state markov chains. Applied Mathematical Sciences, 52, 2563–2574.
Varian, H. R. (2014). Big data: New tricks for econometrics. Journal of Economic Perspectives,
28(2), 3–28. doi:10.1257/jep.28.2.3
Varian, H. R. (2016). Causal inference in economics and marketing. Proceedings of the National
Academy of Sciences U S A, 113, 7310–7315. doi:10.1073/pnas.1510479113
27
Vaughan, L., & Yang, R. (2013). Web traffic and organization performance measures:
Relationships and data sources examined. Journal of Informetrics, 7, 699–711.
doi:10.1016/j.joi.2013.04.005
Wells, J. D., Valacich, J. S., & Hess, T. J. (2011). What signal are you sending? How website
quality influences perceptions of product quality and purchase intentions. MIS Quarterly,
35, 373–396. doi:10.2307/23044048
Yang, S., & Ghose, A. (2010). Analyzing the relationship between organic and sponsored search
advertising: Positive, negative, or zero interdependence? Marketing Science, 29, 602–
623. doi:10.1287/mksc.1090.0552
Zissis, D., Ioannou, G., & Burnetas, A. (2015). Supply chain coordination under discrete
information asymmetries and quantity discounts. Omega,53, 21-29.
28
Tables
Table 1. Results of the experimental procedure. l corresponds to the sampling length and N(Xi) and N(Yi) are the
numbers of peak values of X and Y, respectively, with μ and e corresponding to the mean and standard error,
respectively.
l
Range of x
Range of y
N(Xi)
µ(X)
N(Yi)
µ(Y )
e
60
[11813,11873]
[5391,5451]
2
29
2
28
0.5
65
[16154,16219]
[586,651]
3
26.5
3
27
0.25
70
[7459, 7529]
[6487,6557]
3
27
3
25.5
0.75
75
[13014,13089]
[13519,13594
3
25
3
25.5
0.25
80
[12060,12140
[12830,12910]
3
25.5
3
26.5
0.5
85
[4693,4778
[11555,11640]
3
26
3
25.5
0.25
90
[11137,11227
[2765,2855]
4
25
4
25.33
0.17
95
[2023,2118
[8473,8568]
4
25.33
4
25.67
0.17
100
[16316,16416
[5787,5887]
4
26.33
4
25.33
0.5
105
[9950,10055
[3805,3910]
4
25.33
4
25
0.17
110
[12772,12882
[4337,4447]
4
25.67
4
25.67
0
115
[8602,8717
[11885,12000]
5
25
5
25.75
0.38
120
[15146,15266
[16308,16428]
5
25.25
5
25.25
0
125
[14293,14418
[4323,4448]
5
25
5
25
0
130
[13843,13973
[4140,4270]
5
24.75
5
25
0.13
135
[15798,15933
[5950,6085]
5
25.5
5
25.25
0.13
140
[3343,3483
[4269,4409]
6
25.2
5
24.8
0.2
145
[10473,10618
[8046,8191]
6
25
6
25
0
150
[5979,6129
[14125,14275]
6
24.8
6
24.6
0.1
155
[9950,10105
[9346,9501]
6
24.6
6
24.8
0.1
160
[15593,15753
[4860,5020]
7
24.67
7
24.83
0.08
165
[13554,13719
[13492,13657]
7
24.67
7
24.67
0
170
[6810,6980]
[10165,10335]
7
24.83
7
24.67
0.08
175
[1358,1533]
[966,1141]
7
24.5
7
24.67
0.08
180
[1768,18048]
[1011,1191]
7
24.67
7
24.67
0
185
[16719,16904]
[2326,2511]
8
24.57
8
24.71
0.07
190
[17000,17190]
[15200,15390]
8
24.57
8
24.57
0
29
195
[214,409]
[6035,6230]
8
24.86
8
24.57
0.14
200
[2904,3104]
[14218,14418]
8
24.57
8
24.57
0
30
Index of Figures
Figure 1. Flow of data processing
Figure 2. Hourly bounces for our dataset. The horizontal axis represents the index ×102
Figure 3. Graphical depiction of matrix M (blue line) with its mirror N (red line).
Figure 4. Identification of the start (a) and end (b) of decomposition for the experimental
dataset.
Arrows indicate peak points for the original(x) and the reverse(y) time series.
31
Figure 1
32
Figure 2
33
Figure 3
34
Figure 4
35
Appendix: Graphical Transformation for the experimental section
The segmentation of the resulted time series and the identification of the local maxima as presented
in Table 1, is performed with a step of size of s = 5. For each step, the resulted length (l) transforms
the data series and its reverse, as depicted in the subsequent panels.
Transformation sequence for steps 60 ≤ l ≤ 105
l = 60 l = 65 l = 70 l = 75 l = 80
l = 85 l = 90 l = 95 l = 100 l = 105
Transformation sequence for steps 110 ≤ l ≤ 155
l = 110 l = 115 l = 120 l = 125 l = 130
36
l = 135 l = 140 l = 145 l = 150 l = 155
Transformation sequence for steps 160 ≤ l ≤ 200
l = 160 l = 165 l = 170 l = 175 l = 180
l = 185 l = 190 l = 195 l = 200
- A preview of this full-text is provided by Wiley.
- Learn more
Preview content only
Content available from Expert Systems
This content is subject to copyright. Terms and conditions apply.