ArticlePDF Available

Abstract and Figures

The aim of the paper is to propose a quantile regression based strategy to assess heterogeneity in a multi-block type data structure. Specifically, the paper deals with a particular data structure where several blocks of variables are observed on the same units and a structure of relations is assumed between the different blocks. The idea is that quantile regression complements the results of the least squares regression by evaluating the impact of regressors on the entire distribution of the dependent variable, and not only exclusively on the expected value. By taking advantage of this, the proposed approach analyses the relationship among a dependent variable block and a set of regressors blocks but highlighting possible similarities among the statistical units. An empirical analysis is provided in the consumer analysis framework with the aim to cluster groups of consumers according to the similarities in the dependence structure among their overall liking and the liking for different drivers.
Content may be subject to copyright.
Advances in Data Analysis and Classification
https://doi.org/10.1007/s11634-020-00410-x
REGULAR ARTICLE
On the use of quantile regression to deal with
heterogeneity: the case of multi-block data
Cristina Davino1
·Rosaria Romano1
·Domenico Vistocco2
Received: 18 July 2019 / Revised: 2 July 2020 / Accepted: 8 July 2020
© The Author(s) 2020
Abstract
The aim of the paper is to propose a quantile regression based strategy to assess
heterogeneity in a multi-block type data structure. Specifically, the paper deals with
a particular data structure where several blocks of variables are observed on the same
units and a structure of relations is assumed between the different blocks. The idea
is that quantile regression complements the results of the least squares regression
by evaluating the impact of regressors on the entire distribution of the dependent
variable, and not only exclusively on the expected value. By taking advantage of this,
the proposed approach analyses the relationship among a dependent variable block and
a set of regressors blocks but highlighting possible similarities among the statistical
units. An empirical analysis is provided in the consumer analysis framework with the
aim to cluster groups of consumers according to the similarities in the dependence
structure among their overall liking and the liking for different drivers.
Keywords Quantile regression ·Group dependence structure ·Individual
differences ·Consumer analysis
Mathematics Subject Classification 62G08 ·62P20 ·91B42
1 Introduction
In recent years the growing availability of data of various types and often collected
from different sources requires changes in the classical statistical methods. Ad hoc
BCristina Davino
cristina.davino@unina.it
Rosaria Romano
rosaroma@unina.it
Domenico Vistocco
domenico.vistocco@unina.it
1Department of Economics and Statistics, University of Naples Federico II, Naples, Italy
2Department of Political Science, University of Naples Federico II, Naples, Italy
123
C. Davino et al.
techniques must be tailored for managing data arranged in different types of struc-
tures. In complex domains it is very common that the observed variables are grouped
into homogeneous blocks measuring partial aspects of the phenomenon under inves-
tigation. Such data is usually labelled as multi-block data and the statistical tecniques
used for their analysis are called multi-block methods (Smilde et al. 2000). Important
examples of multi-block data comes from several application fields (marketing, soci-
ology, econometrics, sensory analysis, spectroscopy, ecology, image analysis, etc.).
The basic requirement of multi-block methods is that all blocks have one dimension
(mode) in common. The way in which the blocks are connected to each other gives
rise to different data structure. Different sets of variables can be measured on the same
units. Examples of this structure are several: many sets of indicators to describe dif-
ferent aspects of a complex concept, quality of life, for instance (Davino and Romano
2014); different components (i.e, teaching, research, internationalisation) to define
a university ranking (Romano and Davino 2016); different dimensions that affect
the quality of teaching in high schools (motivation, emotions, strategies, teaching)
(Romano and Palumbo 2013). Another possible structure consists of a set of two-way
matrices of the same units and variables (three-way data). Three-way data is widely
used in sensory analysis when the scores of the judges are not averaged so that the three
sources of variation are products, attributes and judges (Romano et al. 2008,2015;
Bro et al. 2008). This three-way data structure can also be related to a dependent data
set (Romano et al. 2011). Finally, another possible structure comes out when blocks
of different dimensions are connected through a common dimension. Examples in
consumer studies concern the link of product attributes, liking scores and consumers’
characteristics (Martens et al. 2005; Romano et al. 2014; Davino et al. 2015).
Different types of approaches can be used for the analysis of these multi-block
data structures, the choice being based on the type of relationship between the various
blocks (Höskuldsson 2008; Cariou et al. 2018). An exploratory multi-block approach
(Hanafi and Kiers 2006) has to be employed in case no specific causal relationship is
assumed among the different blocks. A supervised approach (Westerhuis et al. 1998),
is instead used whereas a structure of relationships is assumed between the blocks.
Here, the blocks of predictor variables are generally called input blocks, while the
block of dependent variables is named output block. If there is a chain of relationships
among the blocks, the data follows the typical structure of the structural equations
models (Jöreskog and Wold 1982; Bollen 2014).
The present paper deals with multi-block data where all blocks of variables are
observed on the same units and both input blocks and a single output block are avail-
able. Therefore the approach is supervised since the aim is to explore the dependence
structure between the input blocks and the output one. Consumer analysis is con-
sidered as a field of application, since data is generally collected into multi-block
structures (Næs et al. 2011). Here, the output block generally corresponds to the lik-
ing scores given by a sample of consumers on a predefined set of products. The input
blocks can concern both other sensory variables, defined drivers of liking (or specific
liking), and other additional variables on consumers (demographic, habits, attitudes).
The relationships between the input blocks and the output block can be investigated
through different strategies, each corresponding to a different way of arranging the
data into blocks (Næs et al. 2010). The simplest strategy is to transform each block
123
On the use of quantile regression to deal with…
into a single vector by stacking the corresponding columns and estimating a multiple
linear regression model. This strategy does not take into account heterogeneity among
consumers, which is a further source of complexity of data coming from consumer
studies. To this end, some approaches propose to estimate separate regression models
for each consumer to be aggregated a posteriori, by a simple arithmetic average or by
clustering procedures, to highlight segments of homogeneous consumers with respect
to the liking model (Menichelli et al. 2013;Asiolietal.2016).
This paper proposes a multi-step procedure to deal with heterogeneity in multi-
block data. It exploits Quantile Regression (QR) (Koenker and Basset 1978;Davino
et al. 2013; Furno and Vistocco 2018) to evaluate the effect of the regressors on the
entire distribution of the dependent variable. The idea is to complement the classical
approach based on the least squares regression (LSR) to focus beyond the conditional
mean. In the case of consumer data this complementary information can be crucial
given the typical asymmetric distributions of liking scores. QR has already been used
in consumer studies: for estimating the conditional quantiles of liking when segments
of consumers obtained according to their acceptance pattern are related to additional
consumer characteristics (Davino et al. 2015); for assessing heterogeneity across prod-
uct similarities (Davino et al. 2018). The proposed strategy has significant implications
so that the results can be used to adopt appropriate marketing strategies. In particular,
the information obtained concerns the identification of consumer groups that have
similar liking models, that is, for each group it is possible to identify the effect of
each driver of liking on the overall liking. In addition to the different liking models,
the detailed analysis of each group in terms of liking for the individual products also
allows for useful information for the product development.
This paper extends a QR multi-step procedure used to assess heterogeneity (Davino
and Vistocco 2018) to multi-block data. The main idea is to explore if and how the
effects of the drivers on the overall liking differ for groups of consumers at the lower
and higher levels of liking. The strategy consists of three main steps. The first step aims
to identify the best model for each consumer, based on the quantile that best represents
each consumer. In the second step, consumers segments are identified according to
similarities in the dependence structure, using cluster analysis. In the final step, a
different model is estimated for each group of consumers identified in the previous
step (a group can also consist of a single consumer). The proposed procedure is tested
on data from consumer study. The basic aim of the proposal is to learn knowledge from
a complex data structure, where information gathered in several blocks is combined
and analysed in the different steps. The use of QR allows to deal with heterogeneity,
a typical source of variation in many fields of applications. Note that this is one of the
main strengths of QR as compared to the classical LSR. Such an aspect is exploited
in using QR once selecting different models for each consumer according to his/her
position (quantile) in the dependent variable distribution. In addition, the QR is used
in the final step of the procedure when identifying the representative quantiles of the
groups obtained from the cluster procedure. The estimation of different QR models
on predefined quantiles using all the observations of the samples allows statistical
comparisons between the groups both with respect to the entire model and to individual
coefficients.
123
C. Davino et al.
Fig. 1 Description of different multi-block data-structures
The paper is structured as follows. In Sect. 2the main notation is introduced and the
quantile regression based strategy is described. Data used for testing the procedure are
described in Sect. 3, while the corresponding results are included in Sect. 4. Finally,
some concluding remarks and directions of future avenues of research are described
in Sect. 5.
2 Methodology
2.1 Main notation
Let us consider a particular multi-block data arranged as three-way data table, where
an array Zis partitioned in G blocks: Z=[Z1,...,ZG]. In this paper, each block Zg
(g=1,...,G) has dimensions N×(P+1)because it is a column partitioned matrix
composed by the vector (yg) for the response variable (a single element of the output
block as defined in Sect. 1) and a data matrix (Xg) for the regressors. The dimension G
can represent any possible stratification variable, even time. In the empirical analysis
provided in Sects. 3and 4,theNrows represent aset of products, the P+1 variables are
the liking attributes with the dependent variable (the overall liking), while G is the set
of consumers. Figure 1shows three different points of view to represent and analyse
such a kind of multi-block data. For the rest of the paper we will refer to the structure
represented in Fig. 1c, where the G blocks obtained from the stratification variable
are stacked. However, the proposed approach will not consider a single model on all
stacked blocks simultaneously. A multistep strategy is proposed in which the response
123
On the use of quantile regression to deal with…
variable and corresponding predictors are related to each other for each individual
block.
The aim of the paper is to model and cluster the data simultaneously. That means
analysing the relationship among the dependent variable and the set of regressors but
highlighting possible similarities in the G levels of the stratification variable. It is a
matter of fact that if two units show a very similar dependence structure, they can be
considered belonging to the same group.
2.2 Quantile regression based strategy
The strategy proposed in this paper consists of three main steps. The first step consists
of identifying the best model for each level gof the stratification variable, based on
the quantile of the response variable that best represents each level. Subsequently,
the Glevels of the stratification variable are grouped according to similarities in the
dependence structures. Finally, a different model is estimated for each group identified
in the previous step.
The procedure has been introduced by Davino and Vistocco (2018) and applied in
consumer science to handle products effects by Davino et al. (2018). In the present
paper, the approach is adapted in the case of multi-block data with a large number of
blocks. As discussed in Sect. 1, the aim of this paper is to consider an additional source
of complexity in the data given by heterogeneity. Since this involves a great number
of blocks, the strategy proposed in Davino and Vistocco (2018) is here combined with
clustering techniques.
The main strength point of the procedure is represented by the exploitation of QR
in the whole process of analysis. QR allows to estimate the whole distribution of the
conditional quantiles of the response variable thus replacing the classical estimate
of a single value (conditional mean) with estimates of several values (conditional
quantiles). A typical QR model is formulated as:
Qθ(ˆ
y|X)=Xˆ
β(θ), (1)
where Qθ(.|.) is the conditional quantile function for the θ-th conditional quantile
with 0 <1. Each ˆ
βp) coefficient represents the rate of change in the θ-th
conditional quantile of the dependent variable per unit change in the value of the p-th
regressor (p=1,...,P), holding the others constant. Although it is theoretically
possible to estimate an infinite number of quantiles, a finite number is numerically
distinct, the so-called quantile process. Also for QR, several are the functional forms
that can be considered. The paper will refer to linear regression models. The interested
reader may refer to the reference literature for methodological details (Koenker and
Basset 1978; Davino et al. 2013; Furno and Vistocco 2018).
The approach to model and cluster the Glevels is structured in the three steps detailed
below.
(1) Identification of the best model for each level
In the first step, a representative quantile θbest
gis identified for each level g. It will
be named from now on as the best quantile. In particular, computing the empirical
123
C. Davino et al.
cumulative distribution function F(·)on the overall yvariable, the best quantile rep-
resentative of each block gwill be obtained as:
θbest
g=
yyg
F(y)
N,(2)
where Ndenotes the number of units of the generic block g. By computing the empir-
ical cumulative distribution function on y, we refer to the percentile rank of each
observation, i.e. the location of each yi(i=1,...,N×G). For a discussion on the
use of the percentile ranks and the choice of the proper location index to summarise
them, see (Davino and Vistocco 2018).
(2) Identification of the group dependence structure
QR is then carried out on data arranged as in Fig. 1c (for each single block), using the
representative quantiles, that is, the Gquantiles θbest
gpreviously identified. Each model
provides a set of coefficients, one for each level gand for each regressor: ˆ
βpθbest
g
(p=1,...,P). Such coefficients can be arranged into a matrix ˆ
Bθbest [G×(P+1)]
where the additional column refers to the intercept.
The aim of the second step is to identify if there are similar dependence structures
among the Glevels. For this purpose, a hierarchical cluster analysis (CA) is performed
on the ˆ
Bθbest matrix and a partition of Gin Kgroups is identified (k=1,...,K).
Each group will be then characterised by a different quantile deriving from an average
of the θbest associated to the units assigned to the group ( ¯
θbest ).
(3) Estimation of the group dependence structure
In the final step, QR is carried out again on data arranged as in Fig. 1cusingthe
representative quantiles, that is, the Kquantiles assigned to the Kgroups in the previ-
ous step. Each of the Kestimated models provides a set of coefficients, one for each
regressor; differences among the coefficients highlight differences in the group depen-
dence structure. It is worth of noticing that coefficients can be compared because all of
them are estimated on the whole sample (N×G). A testing procedure is implemented
to evaluate the significance of the differences among the coefficients related to each
cluster, exploiting the classical inferential tools available in the QR framework. Two
models estimated at two different quantiles can be compared using a joint tests on all
slope parameters or separate tests on each of the slope parameter. The hypothesis of
interest is that the slope coefficients of two models are identical and the test statistic
is a variant of the Wald test described in Koenker and Bassett (1982).
Let us consider the case of the comparison among the coefficients related to the p
th regressor and estimated at two different quantiles, θbest
kand θbest
k. The null is
H0:βpbest
k)=βpbest
k), and the test statistic is:
T=ˆ
βpbest
k)ˆ
βpbest
k)2
ˆvar ˆ
βpbest
k)ˆ
βpbest
k),(3)
where p=1,...,Pand k,k∈[1,K]. Under the null hypothesis, the test statistic
has an approximate χ2distribution with one degree of freedom.
123
On the use of quantile regression to deal with…
Fig. 2 Distributions for the overall liking: marginal (top) and product specific (from bottom to second to
last)
Such a test statistic can be exploited for coefficients pairwise comparisons. An
extension of it is used as global test on all the slopes. The standard errors, used to eval-
uate the statistical significance of the coefficients, can be estimated using resampling
methods (Parzen et al. 1994).
3 Data description
The empirical analysis is based on data from a consumer testing on 11 tortilla chips,
in which 73 consumers expressed their overall liking for each product on a 9-point
hedonic scale (from 1 =dislike extremely to 9 =like extremely) (Meullenet et al.
2008). Furthermore, consumers themselves have provided a judgment on some drivers
(appearance, flavor, texture) using the same 9-point hedonic scale. Considering the
notation introduced in Sect. 2, the structure of the tortilla dataset is made by N=11
products, P+1=4 liking variables and G=73 consumers (levels).
Figure 2shows the distribution of the overall liking scores on the whole sample of
consumers (marginal) and for each single product. All products show a left skewed
distribution, even if some of them present a higher variability and a less pronounced
skewness (MIS, OAK, MED, GUY, GMG).
A multivariate analysis of consumers’ liking is carried out using a principal com-
ponent analysis (PCA) on the output block (products ×consumers) (data arranged
as in Fig. 1b). Results demonstrate that consumers show preferences for different
products. In Fig. 3consumer vectors are mainly concentrated in the positive verse
of the first dimension, even if there are a few consumers also lying along the second
dimension. Individual differences among consumers for the likings of specific prod-
123
C. Davino et al.
Fig. 3 Loading plot by principal component analysis on the overall liking
Fig. 4 Score plot by principal component analysis on the overall liking
ucts can be further understood from Fig. 4, which represents the products arranged
on all four quadrants. Specifically, the most liked products are SAN, TOB and TOR,
along the main PCA dimension, followed by MIS, MIT and TOM on the second
dimension.
123
On the use of quantile regression to deal with…
Fig. 5 Histogram of the rank percentiles
4 Results
Heterogeneity among consumers described through the PCA in Sect. 3can be fur-
ther investigated linking the overall liking of the consumers to their specific likings
(appearance, flavor, texture) by following the QR based strategy described in Sect. 2.2.
In the first step, a representative quantile is identified for each consumer through the
average rank percentile of the overall liking she/he has expressed on the considered
set of products. The histogram in Fig. 5shows the distribution of the rank percentiles
on the sample of consumers. It is worth noting that there exist a variability on the θbest
and thus an heterogeneity in the liking.
QR is then carried out on data arranged as in Fig. 1c using the representative
quantiles, that is, the Gquantiles θbest
gassigned to the each consumer. Each model
provides a set of coefficients, one for each driver, that can be arranged in a matrix
consumers ×coefficients. The information gathered into such a matrix is crucial for
highlighting the individual differences/similarities among consumers in the way they
weight the drivers linked to the overall linking. To this end, the second step consists
of identifying consumers’ segments by a CA on the different dependence structures
estimated by the QR on the Gquantiles θbest
g.
On the tortilla dataset, the elbow rule identifies as best partition the one with k=3
groups of consumers, homogenous with respect to the liking models. Table 1describes
each cluster through the following information: size, minimum, maximum and average
of the θbest
gof the consumers assigned to the cluster. The last four columns of the table
describe the average values of the original variables. Note that for the rest of the paper
the ¯
θbest will be considered as the quantile representative of each group. Results show
that each cluster is characterised by a different position in the ranking of the overall
liking. Specifically, the first cluster corresponds to the consumers’ segment with lower
123
C. Davino et al.
Table 1 Description of clusters exploiting summaries of θbest and original variables (average)
Size θmin θmax ¯
θbest Overall Appearance Flavor Texture
n1=6 019 0.24 0.22 4.89 5.38 4.85 5.44
n2=34 0.30 0.41 0.35 5.88 6.32 5.77 6.31
n3=33 0.42 0.76 0.51 6.94 7.13 6.71 7.02
Table 2 QR coefficients for the
three representative quantiles of
the three clusters
¯
θbest
1=0.22 ¯
θbest
2=0.35 ¯
θbest
3=0.51
(Intercept) 1.24 0.48 0.00
Appearance 0.23 0.16 0.17
Flavor 0.70 0.77 0.67
Texture 0.18 0.12 0.17
θbest , i.e. to consumers scoring products lower than the others. The last two clusters,
which are the most interesting because of their size, behave in the opposite way. Note
that the third cluster show a wider range of the θbest as highlighted by the minimum
and maximum vales in the Table 1. This emphasises a higher degree of heterogeneity
in this cluster as compared to the others. Another relevant information from Table 1
comes from the analysis of the average values both on liking and drivers. On one hand,
the first cluster has a very small size, with a low degree of liking that can hardly be
modified. On the other hand, clusters 2 and 3 show higher overall liking values than
can be further improved by acting on the specific likings, in particular on the flavor
that reveals the lowest averages inside each cluster.
In the final step, QR is carried out again on data arranged as in Fig. 1cusingthe
representative quantiles assigned to each cluster (¯
θbest ). Coefficients in Table 2are
all significant for α0.05 but intercept and texture coefficient in group 2, which are
significant for α0.10. Moreover the intercept in group 3 is not significant at all.
Combining the information in Tables 1and 2it is possible to argue that flavor is
the most interesting driver to improve the overall liking of the less satisfied consumers
(cluster 1 and 2). If on one hand, flavor has the highest QR coefficients, on the other
hand consumers score this driver lower than the others, with averages close to the
threshold of sufficiency in a scale ranging for 1 to 9.
Results in Table 3complements the characterisation of the groups by testing their
difference in terms of both the whole model (joint test) and the specific coefficients.
The classical Bonferroni correction (Shaffer 1995) has been applied. The p-values
included in the table show that the three models for the three clusters are significantly
different. Focusing on differences among cluster 2 and 3, the p-values emphasise that
even if the size of the flavor and the texture coefficients in the two clusters is quite
similar, they can be considered different from an inferential perspective.
Starting from the results of the testing procedure showing that flavor and texture are
the most discriminating predictors for clusters 2 and 3, a deepen analysis of differences
among products in each cluster would be useful in a product development perspective.
At this aim Fig. 6visualizes the averages of the variables (each single panel) in the
123
On the use of quantile regression to deal with…
Table 3 p-values derived from pairwise comparisons on the whole model and on single coefficient
Joint test Appearance Flavor Texture
Cluster 1 versus cluster 2 0.018 0.024 0.042 0.141
Cluster 1 versus cluster 3 0.009 0.111 0.540 2.295
Cluster 2 versus cluster 3 0.000 2.598 0.000 0.000
Fig. 6 Description of the clusters according to the original variables partitioned by products
three clusters (different colors and symbols), partitioned by products (labels on the left
side). The most relevant information from the figure is that the differences between
these two clusters are evident for the flavor and the texture, where it is possible to
highlight, among the most liked products, those presenting the wider range between
the averages in the two clusters. Specifically the liking of TOR, TOM and BYW that
present lower averages in cluster 2 will be more affected by an improvement in the
flavor. Such information in terms of the liking for products inside each cluster can be
used to suggest appropriate decisions both for marketing and product development
departments.
123
C. Davino et al.
5 Concluding remarks and further developmnents
The paper has shown how to treat an additional source of complexity given by hetero-
geneity in fields of applications where data follows multi-block structures. Focus has
been on one specific data structure, within the supervised framework, where blocks
of variables (a single output block and several input blocks) are observed on the same
statistical units. A QR based strategy has been proposed as a multi-step procedure
able to model and cluster units according to similar dependent structures. The pro-
posal originates from alternative approaches that combine the estimation of separate
dependent relations for each single units with a clustering on the model results. For
instance in conjoint studies (Gustafsson et al. 2003), where a preference model for
each consumer is estimated and then results from the estimated models are synthe-
sised by simple averages or clustering procedures. The strategy proposed only in some
aspects can be traced back to classic approaches. For example, the logic of obtaining
different models for individual consumers and then synthesizing them is in common.
The peculiarity of the proposal is already in the selection of the model, which is based
on the selection of the quantile that best represents each consumer. This aspect makes
the proposal complementary and not alternative to the classic approaches, which are
limited to the study of the effects on average. Any comparison with classical methods
would lead to the classical conclusions deriving from a comparison between QR and
LSR: the two models provides similar results when the homoscedasticity assump-
tion is satisfied. This means that the different blocks (consumers) present the same
dependence structure linking the overall to the specific liking.
The innovative contribution of the proposed approach consists of the following
aspects:
The use of QR in alternative to the classical LSR to model the whole distribution
of the dependent variable.
The identification of different models for each unit obtained by defining the quan-
tile that best represent the position of the unit in the distribution of the dependent
variable.
The estimation of the model characterizing each cluster obtained on all the units
and not only on the ones belonging to the cluster. This is a relevant aspect since the
estimation of the models using all units allows comparisons among the clusters
both for the whole models and for the specific coefficients.
The description of each cluster according to the corresponding specific quantile,
which provides information on the location of the response conditional distribution
mainly affected by the units of the cluster.
Further developments are in the direction of the selection of the best clustering parti-
tion (Bruzzese and Vistocco 2015; Tibshirani and Walther 2001) and on the assessment
of results in terms of stability. The second aspect is even more relevant in application
fields where the number of statistical units (products) is low, like in consumer analysis.
Acknowledgements Open access funding provided by Università degli Studi di Napoli Federico II within
the CRUI-CARE Agreement.
123
On the use of quantile regression to deal with…
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included
in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If
material is not included in the article’s Creative Commons licence and your intended use is not permitted
by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the
copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
References
Asioli D, Næs T,Øvrum A, Almli VL (2016) Comparison of rating-based and choice-based conjoint analysis
models. A case study based on preferences for iced coffee in Norway. Food Qual Prefer 48:174–184
Bollen KA (2014) Structural equations with latent variables, vol 210. Wiley, New York
Bro R, Qannari EM, Kiers HA, Næs T, Frøst MB (2008) Multi- way models for sensory profiling data. J
Chemom 22(1):36–45
Bruzzese D, Vistocco D (2015) DESPOTA: DEndrogram slicing through a permutation test approach. J
Classif 32(2):285–304
Cariou V, Qannari EM, Rutledge DN, Vigneau E (2018) ComDim: from multiblock data analysis to path
modeling. Food Qual Prefer 67:27–34
Davino C, Romano R (2014) Assessment of composite indicators using the ANOVA model combined with
multivariate methods. Soc Indic Res 119(2):627–646
Davino C, Vistocco D (2018) Handling heterogeneity among units in quantile regression. Investigating the
impact of students’ features on university outcome. Stat Interface 11(3):541–556
Davino C, Furno M, Vistocco D (2013) Quantile regression. Theory and applications. Wiley, series in
probability and statistics. Wiley, UK
Davino C, Romano R, Næs T (2015) The use of quantile regression in consumer studies. Food Qual Prefer
40(A):230–239
Davino C, Romano R, Vistocco D (2018) Modelling drivers of consumer liking handling consumer and
product effects. Ital J Appl Stat 30:359–372
Furno M, Vistocco D (2018) Quantile regression. Estimation and simulation., vol 216. Wiley, New York
Gustafsson A, Herrmann A, Huber F (2003) Conjoint measurement: methods and applications. Springer,
Berlin
Hanafi M, Kiers HA (2006) Analysis of K sets of data, with differential emphasis on agreement between
and within sets. Comput Stat Data Anal 51(3):1491–1508
Höskuldsson A (2008) Multiblock and path modelling procedures. J Chemom 22:571–579
Jöreskog KG, Wold HO (1982) Systems under indirect observation: causality, structure, prediction, vol 139.
North Holland, Amsterdam
Koenker R, Basset GW (1978) Regression quantiles. Econometrica 46(1):33–50
Koenker R, Bassett G (1982) Tests of linear hypotheses and L1 estimation. Econometrica 50(6):1577–1584
Martens H, Anderssen E, Flatberg A, Gidskehaug LH, Høy M, Westad F, Martens M (2005) Regression of
a data matrix on descriptors of both its rows and of its columns via latent variables: L-PLSR. Comput
Stat Data Anal 48(1):103–123
Menichelli E, Kraggerud H, Olsen NV, Næs T (2013) Analysing relations between specific and total liking
scores. Food Qual Prefer 28(2):429–440
Meullenet JF, Xiong R, Findlay CJ (2008) Multivariate and probabilistic analyses of sensory science prob-
lems, vol 25. Wiley, New York
Næs T, Lengard V, Johansen SB, Hersleth M (2010) Alternative methods for combining design variables
and consumer preference with information about attitudes and demographics in conjoint analysis.
Food Qual Prefer 21(4):368–378
Næs T, Brockhoff PB, Tomic O (2011) Statistics for sensory and consumer science. Wiley, New York
Parzen MI, Wei LJ, Ying Z (1994) A resampling method based on pivotal estimating functions. Biometrika
81(2):341–350
Romano R, Davino C (2016) Assessing scientific research activity evaluation models using multivariate
analysis. Stat Interface 9(3):303–313
123
C. Davino et al.
Romano R, Palumbo F (2013) Partial possibilistic regression path modeling for subjective measurement. J
Methodol Appl Stat 15:177–190
Romano R, Brockhoff PB, Hersleth M, Tomic O, Næs T (2008) Correcting for different use of the scale
and the need for further analysis of individual differences in sensory analysis. Food Qual Prefer
19(2):197–209
Romano R, Vestergaard JS, Kompany-Zareh M, Bredie WL (2011) Monitoring panel performance within
and between sensory experiments by multi-way analysis, in Classification and Multivariate Analysis
for Complex Data Structures, 335–342. Springer, Berlin
Romano R, Davino C, Næs T (2014) Classification trees in consumer studies for combining both product
attributes and consumer preferences with additional consumer characteristics. Food Qual Prefer 33:27–
36
Romano R, Næs T, Brockhoff PB (2015) Combining analysis of variance and three-way factor analysis
methods for studying additive and multiplicative effects in sensory panel data. J Chemom 29(1):29–37
Shaffer JP (1995) Multiple hypothesis testing. Ann Rev Psychol 46:561–584
Smilde AK, Westerhuis JA, Boque R (2000) Multiway multiblock component and covariates regression
models. J Chemom 14(3):301–331
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic.
J R Stat Soc Ser B (Stat Methodol) 63(2):411–423
Westerhuis JA, Kourti T, MacGregor JF (1998) Analysis of multiblock and hierarchical PCA and PLS
models. J Chemom 12:301–321
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps
and institutional affiliations.
123
... The approach pursued in our proposal exploits quantile regression (Koenker & Bassett, 1978), in line with a previous proposal (Davino et al., 2020) where such method has been used as kernel of a strategy to assess heterogeneity in a different multi-block type data structure. The study models the actor/partner interdependence in the case of dyadic data by presenting an alternative approach with respect to the current used methods (Kenny et al., 2006). ...
Article
Full-text available
Analyzing sports data has become a challenging issue as it involves not standard data structures coming from several sources and with different formats, being often high dimensional and complex. This paper deals with a dyadic structure (athletes/coaches), characterized by a large number of manifest and latent variables. Data were collected in a survey administered within a joint project of University of Naples Federico II and Italian Swimmer Federation. The survey gathers information about psychosocial aspects influencing swimmers’ performance. The paper introduces a data processing method for dyadic data by presenting an alternative approach with respect to the current used models and provides an analysis of psychological factors affecting the actor/partner interdependence by means of a quantile regression. The obtained results could be an asset to design strategies and actions both for coaches and swimmers establishing an original use of statistical methods for analysing athletes psychological behaviour.
... QR was recently used in consumer studies for relating liking to consumer factors (Davino et al. 2015), and for handling consumer heterogeneity (Davino et al. 2018(Davino et al. , 2020. This study introduces the use of QR to PREFMAP in order to provide additional information beyond the classical average effect that the sensory dimensions exert on the consumer preference. ...
Article
Full-text available
External preference mapping is widely used in marketing and R&D divisions to understand the consumer behaviour. The most common preference map is obtained through a two-step procedure that combines principal component analysis and least squares regression. The standard approach exploits classical regression and therefore focuses on the conditional mean. This paper proposes the use of quantile regression to enrich the preference map looking at the whole distribution of the consumer preference. The enriched maps highlight possible different consumer behaviour with respect to the less or most preferred products. This is pursued by exploring the variability of liking along the principal components as well as focusing on the direction of preference. The use of different aesthetics (colours, shapes, size, arrows) equips standard preference map with additional information and does not force the user to change the standard tool she/he is used to. The proposed methodology is shown in action on a case study pertaining yogurt preferences.
Book
Volume two of Quantile Regression offers an important guide for applied researchers that draws on the same example-based approach adopted for the first volume. The text explores topics including robustness, expectiles, m-quantile, decomposition, time series, elemental sets and linear programming. Graphical representations are widely used to visually introduce several issues, and to illustrate each method. All the topics are treated theoretically and using real data examples. Designed as a practical resource, the book is thorough without getting too technical about the statistical background. The authors cover a wide range of QR models useful in several fields. The software commands in R and Stata are available in the appendixes and featured on the accompanying website. The text: •Provides an overview of several technical topics such as robustness of quantile regressions, bootstrap and elemental sets, treatment effect estimators •Compares quantile regression with alternative estimators like expectiles, M-estimators and M-quantiles •Offers a general introduction to linear programming focusing on the simplex method as solving method for the quantile regression problem •Considers time-series issues like non-stationarity, spurious regressions, cointegration, conditional heteroskedasticity via quantile regression •Offers an analysis that is both theoretically and practical •Presents real data examples and graphical representations to explain the technical issues Written for researchers and students in the fields of statistics, economics, econometrics, social and environmental science, this text offers guide to the theory and application of quantile regression models.
Article
In many real data applications, statistical units belong to different groups and statistical models should be tailored to incorporate and exploit this heterogeneity among units. This paper proposes an innovative approach to identify group effects through a quantile regression model. The method assigns a conditional quantile to each group and provides a separate analysis of the dependence structure inside the groups. The relevance of the proposal is provided through an empirical analysis investigating the impact of students' features on University outcome. The analysis is performed on a sample of graduated students; the degree mark is the response variable, a set of variables describing the students' profile are used as regressors, and the attended School determines the group effects. A working example and a small simulation study are introduced to highlight the main features of the proposed approach.
Book
by Paul E. Green I am honored and pleased to respond to authors request to write a Fore­ word for this excellent collection of essays on conjoint analysis and related topics. While a number of survey articles and sporadic book chapters have appeared on the subject, to the best of my knowledge this book represents the first volume of contributed essays on conjoint analysis. The book re­ flects not only the geographical diversity of its contributors but also the variety and depth of their topics. The development of conjoint analysis and its application to marketing and business research is noteworthy, both in its eclectic roots (psychometrics, statistics, operations research, economics) and the fact that its development reflects the efforts of a large variety of professionals - academics, market­ ing research consultants, industry practitioners, and software developers. Reasons for the early success and diffusion of conjoint analysis are not hard to find. First, by the early sixties, precursory psychometric techniques (e.g., multidimensional scaling and correspondence analysis, cluster analy­ sis, and general multivariate techniques) had already shown their value in practical business research and application. Second, conjoint analysis pro­ vided a new and powerful array of methods for tackling the important problem of representing and predicting buyer preference judgments and choice behavior - clearly a major problem area in marketing.
Chapter
The development and dissemination of quantile regression (QR) started with the formulation of the QR problem as a linear programming problem. Such formulation allows to exploit efficient methods and algorithms to solve a complex optimization problem offering the way to explore the whole conditional distribution of a variable and not only its center. After an introduction to the linear programming approach for solving the QR problem, this chapter focuses on the added value of QR exploring its features in the case of regression models with homogeneous, heterogeneous and dependent error models. Subsequently, a set of artificial data is used to show several QR features. A section focuses on the interpretation of the QR estimated coefficients by drawing a parallel between homogeneous and heterogeneous regression models.
Article
ComDim (Common Dimensions) analysis was initially introduced within the context of sensometrics to analyze conventional and free choice sensory profiling data, and more generally multiblock datasets. Thereafter, it has gained some popularity in chemometrics and has been extended in different ways to meet specific needs. Recently, this strategy of analysis has been adapted to the supervised case, under the name of P-ComDim. Going further, we propose herein to extend ComDim to Path-ComDim where the datasets at hand are assumed to have a specific pattern of directed relations among them reflecting, for instance, a chain of influence. The aim of Path-ComDim is to analyze these datasets taking into account the structural connections among them. After a brief review of alternative path modeling approaches, Path-ComDim is detailed encompassing both methodological and algorithmic aspects. In the particular case of a single block to be predicted, it is shown that Path-ComDim is equivalent to P-ComDim analysis. Path-ComDim analysis is illustrated on the basis of a case study involving instrumental, sensory and preference data. Finally, the outcomes are compared to those obtained from alternative path modeling methods.
Article
In this paper the general theory of multiway multiblock component and covariates regression models is explained. Unlike in existing methods such as multiblock PLS and multiblock PCA, in the new proposed method a different number of components can be selected for each block. Furthermore, the method can be generalized to incorporate multiway blocks to which any multiway model can be applied. The method is a direct extension of principal covariates regression and therefore works in a simultaneous fashion in which a clearly defined objective criterion is minimized. It can be tuned to fulfil the requirements of the user. Algorithms to calculate the components will be presented. The method will be illustrated with two three-block examples and compared to existing approaches. The first example is with two-way data and the second example is with a three-way array. It will be shown that predictions are as good as with the existing methods, but because for most blocks fewer components are required, diagnostic properties of the method are improved. Copyright (C) 2000 John Wiley & Sons, Ltd.
Article
The authors of this paper propose a method, based both on confirmatory and exploratory data analysis, aiming to assess the variability arising from the Composite Indicators (CIs) construction process. This research refers to an evaluation exercise very important for universities: the assessment of scientific research. The aim of every evaluation system is to synthesize all the information collected at universities into a unique CI, which will allow comparison of performances or ranks of the objects under evaluation. Since the methodology adopted to construct the CI is just one possible solution among several acceptable alternatives, it is reasonable to wonder about the results from the other options. The proposed approach investigates the impact of the different sources of variability occurring in CIs construction, also taking into account the external information available for each statistical unit. The term CI variability is used in the meaning of CI stability and it refers to differences emerging among CIs obtained using different subjective choices to construct the CI. Furthermore, the stability of the results is assessed through a combination of graphical tools and resampling methods. An empirical analysis is provided to discuss the methodological proposal. The research refers to the 'University Planning and Evaluation 2007-2009' system, implemented by the Italian government to finance public universities.
Article
Hierarchical clustering represents one of the most widespread analytical approaches to tackle classification problems mainly due to the visual powerfulness of the associated graphical representation, the dendrogram. That said, the requirement of appropriately choosing the number of clusters still represents the main difficulty for the final user. We introduce DESPOTA (DEndrogram Slicing through a PermutatiOn Test Approach), a novel approach exploiting permutation tests in order to automatically detect a partition among those embedded in a dendrogram. Unlike the traditional approach, DESPOTA includes in the search space also partitions not corresponding to horizontal cuts of the dendrogram. Applications on both real and syntethic datasets will show the effectiveness of our proposal.
Article
The authors compare two conjoint analysis approaches eliciting consumer preferences among different product profiles of iced coffees in Norway: Rating-based and Choice-based conjoint experiments. In the conjoint experiments, stimuli were presented in the form of mock-up pictures of iced coffees varying in coffee type, production origin, calorie content and price, following an orthogonal design. One group of participants (n = 101) performed a rating task of 12 iced coffees whereas another group (n = 102) performed a choice task on 20 iced coffees presented in eight triads. Then, all participants performed self-explicated rating and ranking evaluations of the iced coffee attributes. The rating data were analysed by a Mixed Model ANOVA while the choice data were analysed by a Mixed Logit Model. Both models include conjoint factors, demographic variables and their interactions. Results show that the two approaches share similar main results, where consumers prefer low calorie and low price iced coffee products. However, additional effects are detected within each of the two approaches. Further, self-explicated measures indicate that coffee type is the primary attribute for consumers’ selection of iced coffee. The two conjoint approaches are compared and discussed in terms of experimental designs, data analysis methodologies, outcomes, user-friendliness of the results interpretation, estimation power and practical issues.