Available via license: CC BY 4.0

Content may be subject to copyright.

Advances in Data Analysis and Classiﬁcation

https://doi.org/10.1007/s11634-020-00410-x

REGULAR ARTICLE

On the use of quantile regression to deal with

heterogeneity: the case of multi-block data

Cristina Davino1

·Rosaria Romano1

·Domenico Vistocco2

Received: 18 July 2019 / Revised: 2 July 2020 / Accepted: 8 July 2020

© The Author(s) 2020

Abstract

The aim of the paper is to propose a quantile regression based strategy to assess

heterogeneity in a multi-block type data structure. Speciﬁcally, the paper deals with

a particular data structure where several blocks of variables are observed on the same

units and a structure of relations is assumed between the different blocks. The idea

is that quantile regression complements the results of the least squares regression

by evaluating the impact of regressors on the entire distribution of the dependent

variable, and not only exclusively on the expected value. By taking advantage of this,

the proposed approach analyses the relationship among a dependent variable block and

a set of regressors blocks but highlighting possible similarities among the statistical

units. An empirical analysis is provided in the consumer analysis framework with the

aim to cluster groups of consumers according to the similarities in the dependence

structure among their overall liking and the liking for different drivers.

Keywords Quantile regression ·Group dependence structure ·Individual

differences ·Consumer analysis

Mathematics Subject Classiﬁcation 62G08 ·62P20 ·91B42

1 Introduction

In recent years the growing availability of data of various types and often collected

from different sources requires changes in the classical statistical methods. Ad hoc

BCristina Davino

cristina.davino@unina.it

Rosaria Romano

rosaroma@unina.it

Domenico Vistocco

domenico.vistocco@unina.it

1Department of Economics and Statistics, University of Naples Federico II, Naples, Italy

2Department of Political Science, University of Naples Federico II, Naples, Italy

123

C. Davino et al.

techniques must be tailored for managing data arranged in different types of struc-

tures. In complex domains it is very common that the observed variables are grouped

into homogeneous blocks measuring partial aspects of the phenomenon under inves-

tigation. Such data is usually labelled as multi-block data and the statistical tecniques

used for their analysis are called multi-block methods (Smilde et al. 2000). Important

examples of multi-block data comes from several application ﬁelds (marketing, soci-

ology, econometrics, sensory analysis, spectroscopy, ecology, image analysis, etc.).

The basic requirement of multi-block methods is that all blocks have one dimension

(mode) in common. The way in which the blocks are connected to each other gives

rise to different data structure. Different sets of variables can be measured on the same

units. Examples of this structure are several: many sets of indicators to describe dif-

ferent aspects of a complex concept, quality of life, for instance (Davino and Romano

2014); different components (i.e, teaching, research, internationalisation) to deﬁne

a university ranking (Romano and Davino 2016); different dimensions that affect

the quality of teaching in high schools (motivation, emotions, strategies, teaching)

(Romano and Palumbo 2013). Another possible structure consists of a set of two-way

matrices of the same units and variables (three-way data). Three-way data is widely

used in sensory analysis when the scores of the judges are not averaged so that the three

sources of variation are products, attributes and judges (Romano et al. 2008,2015;

Bro et al. 2008). This three-way data structure can also be related to a dependent data

set (Romano et al. 2011). Finally, another possible structure comes out when blocks

of different dimensions are connected through a common dimension. Examples in

consumer studies concern the link of product attributes, liking scores and consumers’

characteristics (Martens et al. 2005; Romano et al. 2014; Davino et al. 2015).

Different types of approaches can be used for the analysis of these multi-block

data structures, the choice being based on the type of relationship between the various

blocks (Höskuldsson 2008; Cariou et al. 2018). An exploratory multi-block approach

(Hanaﬁ and Kiers 2006) has to be employed in case no speciﬁc causal relationship is

assumed among the different blocks. A supervised approach (Westerhuis et al. 1998),

is instead used whereas a structure of relationships is assumed between the blocks.

Here, the blocks of predictor variables are generally called input blocks, while the

block of dependent variables is named output block. If there is a chain of relationships

among the blocks, the data follows the typical structure of the structural equations

models (Jöreskog and Wold 1982; Bollen 2014).

The present paper deals with multi-block data where all blocks of variables are

observed on the same units and both input blocks and a single output block are avail-

able. Therefore the approach is supervised since the aim is to explore the dependence

structure between the input blocks and the output one. Consumer analysis is con-

sidered as a ﬁeld of application, since data is generally collected into multi-block

structures (Næs et al. 2011). Here, the output block generally corresponds to the lik-

ing scores given by a sample of consumers on a predeﬁned set of products. The input

blocks can concern both other sensory variables, deﬁned drivers of liking (or speciﬁc

liking), and other additional variables on consumers (demographic, habits, attitudes).

The relationships between the input blocks and the output block can be investigated

through different strategies, each corresponding to a different way of arranging the

data into blocks (Næs et al. 2010). The simplest strategy is to transform each block

123

On the use of quantile regression to deal with…

into a single vector by stacking the corresponding columns and estimating a multiple

linear regression model. This strategy does not take into account heterogeneity among

consumers, which is a further source of complexity of data coming from consumer

studies. To this end, some approaches propose to estimate separate regression models

for each consumer to be aggregated a posteriori, by a simple arithmetic average or by

clustering procedures, to highlight segments of homogeneous consumers with respect

to the liking model (Menichelli et al. 2013;Asiolietal.2016).

This paper proposes a multi-step procedure to deal with heterogeneity in multi-

block data. It exploits Quantile Regression (QR) (Koenker and Basset 1978;Davino

et al. 2013; Furno and Vistocco 2018) to evaluate the effect of the regressors on the

entire distribution of the dependent variable. The idea is to complement the classical

approach based on the least squares regression (LSR) to focus beyond the conditional

mean. In the case of consumer data this complementary information can be crucial

given the typical asymmetric distributions of liking scores. QR has already been used

in consumer studies: for estimating the conditional quantiles of liking when segments

of consumers obtained according to their acceptance pattern are related to additional

consumer characteristics (Davino et al. 2015); for assessing heterogeneity across prod-

uct similarities (Davino et al. 2018). The proposed strategy has signiﬁcant implications

so that the results can be used to adopt appropriate marketing strategies. In particular,

the information obtained concerns the identiﬁcation of consumer groups that have

similar liking models, that is, for each group it is possible to identify the effect of

each driver of liking on the overall liking. In addition to the different liking models,

the detailed analysis of each group in terms of liking for the individual products also

allows for useful information for the product development.

This paper extends a QR multi-step procedure used to assess heterogeneity (Davino

and Vistocco 2018) to multi-block data. The main idea is to explore if and how the

effects of the drivers on the overall liking differ for groups of consumers at the lower

and higher levels of liking. The strategy consists of three main steps. The ﬁrst step aims

to identify the best model for each consumer, based on the quantile that best represents

each consumer. In the second step, consumers segments are identiﬁed according to

similarities in the dependence structure, using cluster analysis. In the ﬁnal step, a

different model is estimated for each group of consumers identiﬁed in the previous

step (a group can also consist of a single consumer). The proposed procedure is tested

on data from consumer study. The basic aim of the proposal is to learn knowledge from

a complex data structure, where information gathered in several blocks is combined

and analysed in the different steps. The use of QR allows to deal with heterogeneity,

a typical source of variation in many ﬁelds of applications. Note that this is one of the

main strengths of QR as compared to the classical LSR. Such an aspect is exploited

in using QR once selecting different models for each consumer according to his/her

position (quantile) in the dependent variable distribution. In addition, the QR is used

in the ﬁnal step of the procedure when identifying the representative quantiles of the

groups obtained from the cluster procedure. The estimation of different QR models

on predeﬁned quantiles using all the observations of the samples allows statistical

comparisons between the groups both with respect to the entire model and to individual

coefﬁcients.

123

C. Davino et al.

Fig. 1 Description of different multi-block data-structures

The paper is structured as follows. In Sect. 2the main notation is introduced and the

quantile regression based strategy is described. Data used for testing the procedure are

described in Sect. 3, while the corresponding results are included in Sect. 4. Finally,

some concluding remarks and directions of future avenues of research are described

in Sect. 5.

2 Methodology

2.1 Main notation

Let us consider a particular multi-block data arranged as three-way data table, where

an array Zis partitioned in G blocks: Z=[Z1,...,ZG]. In this paper, each block Zg

(g=1,...,G) has dimensions N×(P+1)because it is a column partitioned matrix

composed by the vector (yg) for the response variable (a single element of the output

block as deﬁned in Sect. 1) and a data matrix (Xg) for the regressors. The dimension G

can represent any possible stratiﬁcation variable, even time. In the empirical analysis

provided in Sects. 3and 4,theNrows represent aset of products, the P+1 variables are

the liking attributes with the dependent variable (the overall liking), while G is the set

of consumers. Figure 1shows three different points of view to represent and analyse

such a kind of multi-block data. For the rest of the paper we will refer to the structure

represented in Fig. 1c, where the G blocks obtained from the stratiﬁcation variable

are stacked. However, the proposed approach will not consider a single model on all

stacked blocks simultaneously. A multistep strategy is proposed in which the response

123

On the use of quantile regression to deal with…

variable and corresponding predictors are related to each other for each individual

block.

The aim of the paper is to model and cluster the data simultaneously. That means

analysing the relationship among the dependent variable and the set of regressors but

highlighting possible similarities in the G levels of the stratiﬁcation variable. It is a

matter of fact that if two units show a very similar dependence structure, they can be

considered belonging to the same group.

2.2 Quantile regression based strategy

The strategy proposed in this paper consists of three main steps. The ﬁrst step consists

of identifying the best model for each level gof the stratiﬁcation variable, based on

the quantile of the response variable that best represents each level. Subsequently,

the Glevels of the stratiﬁcation variable are grouped according to similarities in the

dependence structures. Finally, a different model is estimated for each group identiﬁed

in the previous step.

The procedure has been introduced by Davino and Vistocco (2018) and applied in

consumer science to handle products effects by Davino et al. (2018). In the present

paper, the approach is adapted in the case of multi-block data with a large number of

blocks. As discussed in Sect. 1, the aim of this paper is to consider an additional source

of complexity in the data given by heterogeneity. Since this involves a great number

of blocks, the strategy proposed in Davino and Vistocco (2018) is here combined with

clustering techniques.

The main strength point of the procedure is represented by the exploitation of QR

in the whole process of analysis. QR allows to estimate the whole distribution of the

conditional quantiles of the response variable thus replacing the classical estimate

of a single value (conditional mean) with estimates of several values (conditional

quantiles). A typical QR model is formulated as:

Qθ(ˆ

y|X)=Xˆ

β(θ), (1)

where Qθ(.|.) is the conditional quantile function for the θ-th conditional quantile

with 0 <θ <1. Each ˆ

βp(θ) coefﬁcient represents the rate of change in the θ-th

conditional quantile of the dependent variable per unit change in the value of the p-th

regressor (p=1,...,P), holding the others constant. Although it is theoretically

possible to estimate an inﬁnite number of quantiles, a ﬁnite number is numerically

distinct, the so-called quantile process. Also for QR, several are the functional forms

that can be considered. The paper will refer to linear regression models. The interested

reader may refer to the reference literature for methodological details (Koenker and

Basset 1978; Davino et al. 2013; Furno and Vistocco 2018).

The approach to model and cluster the Glevels is structured in the three steps detailed

below.

(1) Identiﬁcation of the best model for each level

In the ﬁrst step, a representative quantile θbest

gis identiﬁed for each level g. It will

be named from now on as the best quantile. In particular, computing the empirical

123

C. Davino et al.

cumulative distribution function F(·)on the overall yvariable, the best quantile rep-

resentative of each block gwill be obtained as:

θbest

g=

y∈yg

F(y)

N,(2)

where Ndenotes the number of units of the generic block g. By computing the empir-

ical cumulative distribution function on y, we refer to the percentile rank of each

observation, i.e. the location of each yi(i=1,...,N×G). For a discussion on the

use of the percentile ranks and the choice of the proper location index to summarise

them, see (Davino and Vistocco 2018).

(2) Identiﬁcation of the group dependence structure

QR is then carried out on data arranged as in Fig. 1c (for each single block), using the

representative quantiles, that is, the Gquantiles θbest

gpreviously identiﬁed. Each model

provides a set of coefﬁcients, one for each level gand for each regressor: ˆ

βpθbest

g

(p=1,...,P). Such coefﬁcients can be arranged into a matrix ˆ

Bθbest [G×(P+1)]

where the additional column refers to the intercept.

The aim of the second step is to identify if there are similar dependence structures

among the Glevels. For this purpose, a hierarchical cluster analysis (CA) is performed

on the ˆ

Bθbest matrix and a partition of Gin Kgroups is identiﬁed (k=1,...,K).

Each group will be then characterised by a different quantile deriving from an average

of the θbest associated to the units assigned to the group ( ¯

θbest ).

(3) Estimation of the group dependence structure

In the ﬁnal step, QR is carried out again on data arranged as in Fig. 1cusingthe

representative quantiles, that is, the Kquantiles assigned to the Kgroups in the previ-

ous step. Each of the Kestimated models provides a set of coefﬁcients, one for each

regressor; differences among the coefﬁcients highlight differences in the group depen-

dence structure. It is worth of noticing that coefﬁcients can be compared because all of

them are estimated on the whole sample (N×G). A testing procedure is implemented

to evaluate the signiﬁcance of the differences among the coefﬁcients related to each

cluster, exploiting the classical inferential tools available in the QR framework. Two

models estimated at two different quantiles can be compared using a joint tests on all

slope parameters or separate tests on each of the slope parameter. The hypothesis of

interest is that the slope coefﬁcients of two models are identical and the test statistic

is a variant of the Wald test described in Koenker and Bassett (1982).

Let us consider the case of the comparison among the coefﬁcients related to the p–

th regressor and estimated at two different quantiles, θbest

kand θbest

k. The null is

H0:βp(θbest

k)=βp(θbest

k), and the test statistic is:

T=ˆ

βp(θbest

k)−ˆ

βp(θbest

k)2

ˆvar ˆ

βp(θbest

k)−ˆ

βp(θbest

k),(3)

where p=1,...,Pand k,k∈[1,K]. Under the null hypothesis, the test statistic

has an approximate χ2distribution with one degree of freedom.

123

On the use of quantile regression to deal with…

Fig. 2 Distributions for the overall liking: marginal (top) and product speciﬁc (from bottom to second to

last)

Such a test statistic can be exploited for coefﬁcients pairwise comparisons. An

extension of it is used as global test on all the slopes. The standard errors, used to eval-

uate the statistical signiﬁcance of the coefﬁcients, can be estimated using resampling

methods (Parzen et al. 1994).

3 Data description

The empirical analysis is based on data from a consumer testing on 11 tortilla chips,

in which 73 consumers expressed their overall liking for each product on a 9-point

hedonic scale (from 1 =dislike extremely to 9 =like extremely) (Meullenet et al.

2008). Furthermore, consumers themselves have provided a judgment on some drivers

(appearance, ﬂavor, texture) using the same 9-point hedonic scale. Considering the

notation introduced in Sect. 2, the structure of the tortilla dataset is made by N=11

products, P+1=4 liking variables and G=73 consumers (levels).

Figure 2shows the distribution of the overall liking scores on the whole sample of

consumers (marginal) and for each single product. All products show a left skewed

distribution, even if some of them present a higher variability and a less pronounced

skewness (MIS, OAK, MED, GUY, GMG).

A multivariate analysis of consumers’ liking is carried out using a principal com-

ponent analysis (PCA) on the output block (products ×consumers) (data arranged

as in Fig. 1b). Results demonstrate that consumers show preferences for different

products. In Fig. 3consumer vectors are mainly concentrated in the positive verse

of the ﬁrst dimension, even if there are a few consumers also lying along the second

dimension. Individual differences among consumers for the likings of speciﬁc prod-

123

C. Davino et al.

Fig. 3 Loading plot by principal component analysis on the overall liking

Fig. 4 Score plot by principal component analysis on the overall liking

ucts can be further understood from Fig. 4, which represents the products arranged

on all four quadrants. Speciﬁcally, the most liked products are SAN, TOB and TOR,

along the main PCA dimension, followed by MIS, MIT and TOM on the second

dimension.

123

On the use of quantile regression to deal with…

Fig. 5 Histogram of the rank percentiles

4 Results

Heterogeneity among consumers described through the PCA in Sect. 3can be fur-

ther investigated linking the overall liking of the consumers to their speciﬁc likings

(appearance, ﬂavor, texture) by following the QR based strategy described in Sect. 2.2.

In the ﬁrst step, a representative quantile is identiﬁed for each consumer through the

average rank percentile of the overall liking she/he has expressed on the considered

set of products. The histogram in Fig. 5shows the distribution of the rank percentiles

on the sample of consumers. It is worth noting that there exist a variability on the θbest

and thus an heterogeneity in the liking.

QR is then carried out on data arranged as in Fig. 1c using the representative

quantiles, that is, the Gquantiles θbest

gassigned to the each consumer. Each model

provides a set of coefﬁcients, one for each driver, that can be arranged in a matrix

consumers ×coefﬁcients. The information gathered into such a matrix is crucial for

highlighting the individual differences/similarities among consumers in the way they

weight the drivers linked to the overall linking. To this end, the second step consists

of identifying consumers’ segments by a CA on the different dependence structures

estimated by the QR on the Gquantiles θbest

g.

On the tortilla dataset, the elbow rule identiﬁes as best partition the one with k=3

groups of consumers, homogenous with respect to the liking models. Table 1describes

each cluster through the following information: size, minimum, maximum and average

of the θbest

gof the consumers assigned to the cluster. The last four columns of the table

describe the average values of the original variables. Note that for the rest of the paper

the ¯

θbest will be considered as the quantile representative of each group. Results show

that each cluster is characterised by a different position in the ranking of the overall

liking. Speciﬁcally, the ﬁrst cluster corresponds to the consumers’ segment with lower

123

C. Davino et al.

Table 1 Description of clusters exploiting summaries of θbest and original variables (average)

Size θmin θmax ¯

θbest Overall Appearance Flavor Texture

n1=6 019 0.24 0.22 4.89 5.38 4.85 5.44

n2=34 0.30 0.41 0.35 5.88 6.32 5.77 6.31

n3=33 0.42 0.76 0.51 6.94 7.13 6.71 7.02

Table 2 QR coefﬁcients for the

three representative quantiles of

the three clusters

¯

θbest

1=0.22 ¯

θbest

2=0.35 ¯

θbest

3=0.51

(Intercept) −1.24 −0.48 0.00

Appearance 0.23 0.16 0.17

Flavor 0.70 0.77 0.67

Texture 0.18 0.12 0.17

θbest , i.e. to consumers scoring products lower than the others. The last two clusters,

which are the most interesting because of their size, behave in the opposite way. Note

that the third cluster show a wider range of the θbest as highlighted by the minimum

and maximum vales in the Table 1. This emphasises a higher degree of heterogeneity

in this cluster as compared to the others. Another relevant information from Table 1

comes from the analysis of the average values both on liking and drivers. On one hand,

the ﬁrst cluster has a very small size, with a low degree of liking that can hardly be

modiﬁed. On the other hand, clusters 2 and 3 show higher overall liking values than

can be further improved by acting on the speciﬁc likings, in particular on the ﬂavor

that reveals the lowest averages inside each cluster.

In the ﬁnal step, QR is carried out again on data arranged as in Fig. 1cusingthe

representative quantiles assigned to each cluster (¯

θbest ). Coefﬁcients in Table 2are

all signiﬁcant for α≤0.05 but intercept and texture coefﬁcient in group 2, which are

signiﬁcant for α≤0.10. Moreover the intercept in group 3 is not signiﬁcant at all.

Combining the information in Tables 1and 2it is possible to argue that ﬂavor is

the most interesting driver to improve the overall liking of the less satisﬁed consumers

(cluster 1 and 2). If on one hand, ﬂavor has the highest QR coefﬁcients, on the other

hand consumers score this driver lower than the others, with averages close to the

threshold of sufﬁciency in a scale ranging for 1 to 9.

Results in Table 3complements the characterisation of the groups by testing their

difference in terms of both the whole model (joint test) and the speciﬁc coefﬁcients.

The classical Bonferroni correction (Shaffer 1995) has been applied. The p-values

included in the table show that the three models for the three clusters are signiﬁcantly

different. Focusing on differences among cluster 2 and 3, the p-values emphasise that

even if the size of the ﬂavor and the texture coefﬁcients in the two clusters is quite

similar, they can be considered different from an inferential perspective.

Starting from the results of the testing procedure showing that ﬂavor and texture are

the most discriminating predictors for clusters 2 and 3, a deepen analysis of differences

among products in each cluster would be useful in a product development perspective.

At this aim Fig. 6visualizes the averages of the variables (each single panel) in the

123

On the use of quantile regression to deal with…

Table 3 p-values derived from pairwise comparisons on the whole model and on single coefﬁcient

Joint test Appearance Flavor Texture

Cluster 1 versus cluster 2 0.018 0.024 0.042 0.141

Cluster 1 versus cluster 3 0.009 0.111 0.540 2.295

Cluster 2 versus cluster 3 0.000 2.598 0.000 0.000

Fig. 6 Description of the clusters according to the original variables partitioned by products

three clusters (different colors and symbols), partitioned by products (labels on the left

side). The most relevant information from the ﬁgure is that the differences between

these two clusters are evident for the ﬂavor and the texture, where it is possible to

highlight, among the most liked products, those presenting the wider range between

the averages in the two clusters. Speciﬁcally the liking of TOR, TOM and BYW that

present lower averages in cluster 2 will be more affected by an improvement in the

ﬂavor. Such information in terms of the liking for products inside each cluster can be

used to suggest appropriate decisions both for marketing and product development

departments.

123

C. Davino et al.

5 Concluding remarks and further developmnents

The paper has shown how to treat an additional source of complexity given by hetero-

geneity in ﬁelds of applications where data follows multi-block structures. Focus has

been on one speciﬁc data structure, within the supervised framework, where blocks

of variables (a single output block and several input blocks) are observed on the same

statistical units. A QR based strategy has been proposed as a multi-step procedure

able to model and cluster units according to similar dependent structures. The pro-

posal originates from alternative approaches that combine the estimation of separate

dependent relations for each single units with a clustering on the model results. For

instance in conjoint studies (Gustafsson et al. 2003), where a preference model for

each consumer is estimated and then results from the estimated models are synthe-

sised by simple averages or clustering procedures. The strategy proposed only in some

aspects can be traced back to classic approaches. For example, the logic of obtaining

different models for individual consumers and then synthesizing them is in common.

The peculiarity of the proposal is already in the selection of the model, which is based

on the selection of the quantile that best represents each consumer. This aspect makes

the proposal complementary and not alternative to the classic approaches, which are

limited to the study of the effects on average. Any comparison with classical methods

would lead to the classical conclusions deriving from a comparison between QR and

LSR: the two models provides similar results when the homoscedasticity assump-

tion is satisﬁed. This means that the different blocks (consumers) present the same

dependence structure linking the overall to the speciﬁc liking.

The innovative contribution of the proposed approach consists of the following

aspects:

– The use of QR in alternative to the classical LSR to model the whole distribution

of the dependent variable.

– The identiﬁcation of different models for each unit obtained by deﬁning the quan-

tile that best represent the position of the unit in the distribution of the dependent

variable.

– The estimation of the model characterizing each cluster obtained on all the units

and not only on the ones belonging to the cluster. This is a relevant aspect since the

estimation of the models using all units allows comparisons among the clusters

both for the whole models and for the speciﬁc coefﬁcients.

– The description of each cluster according to the corresponding speciﬁc quantile,

which provides information on the location of the response conditional distribution

mainly affected by the units of the cluster.

Further developments are in the direction of the selection of the best clustering parti-

tion (Bruzzese and Vistocco 2015; Tibshirani and Walther 2001) and on the assessment

of results in terms of stability. The second aspect is even more relevant in application

ﬁelds where the number of statistical units (products) is low, like in consumer analysis.

Acknowledgements Open access funding provided by Università degli Studi di Napoli Federico II within

the CRUI-CARE Agreement.

123

On the use of quantile regression to deal with…

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which

permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give

appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,

and indicate if changes were made. The images or other third party material in this article are included

in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If

material is not included in the article’s Creative Commons licence and your intended use is not permitted

by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the

copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References

Asioli D, Næs T,Øvrum A, Almli VL (2016) Comparison of rating-based and choice-based conjoint analysis

models. A case study based on preferences for iced coffee in Norway. Food Qual Prefer 48:174–184

Bollen KA (2014) Structural equations with latent variables, vol 210. Wiley, New York

Bro R, Qannari EM, Kiers HA, Næs T, Frøst MB (2008) Multi- way models for sensory proﬁling data. J

Chemom 22(1):36–45

Bruzzese D, Vistocco D (2015) DESPOTA: DEndrogram slicing through a permutation test approach. J

Classif 32(2):285–304

Cariou V, Qannari EM, Rutledge DN, Vigneau E (2018) ComDim: from multiblock data analysis to path

modeling. Food Qual Prefer 67:27–34

Davino C, Romano R (2014) Assessment of composite indicators using the ANOVA model combined with

multivariate methods. Soc Indic Res 119(2):627–646

Davino C, Vistocco D (2018) Handling heterogeneity among units in quantile regression. Investigating the

impact of students’ features on university outcome. Stat Interface 11(3):541–556

Davino C, Furno M, Vistocco D (2013) Quantile regression. Theory and applications. Wiley, series in

probability and statistics. Wiley, UK

Davino C, Romano R, Næs T (2015) The use of quantile regression in consumer studies. Food Qual Prefer

40(A):230–239

Davino C, Romano R, Vistocco D (2018) Modelling drivers of consumer liking handling consumer and

product effects. Ital J Appl Stat 30:359–372

Furno M, Vistocco D (2018) Quantile regression. Estimation and simulation., vol 216. Wiley, New York

Gustafsson A, Herrmann A, Huber F (2003) Conjoint measurement: methods and applications. Springer,

Berlin

Hanaﬁ M, Kiers HA (2006) Analysis of K sets of data, with differential emphasis on agreement between

and within sets. Comput Stat Data Anal 51(3):1491–1508

Höskuldsson A (2008) Multiblock and path modelling procedures. J Chemom 22:571–579

Jöreskog KG, Wold HO (1982) Systems under indirect observation: causality, structure, prediction, vol 139.

North Holland, Amsterdam

Koenker R, Basset GW (1978) Regression quantiles. Econometrica 46(1):33–50

Koenker R, Bassett G (1982) Tests of linear hypotheses and L1 estimation. Econometrica 50(6):1577–1584

Martens H, Anderssen E, Flatberg A, Gidskehaug LH, Høy M, Westad F, Martens M (2005) Regression of

a data matrix on descriptors of both its rows and of its columns via latent variables: L-PLSR. Comput

Stat Data Anal 48(1):103–123

Menichelli E, Kraggerud H, Olsen NV, Næs T (2013) Analysing relations between speciﬁc and total liking

scores. Food Qual Prefer 28(2):429–440

Meullenet JF, Xiong R, Findlay CJ (2008) Multivariate and probabilistic analyses of sensory science prob-

lems, vol 25. Wiley, New York

Næs T, Lengard V, Johansen SB, Hersleth M (2010) Alternative methods for combining design variables

and consumer preference with information about attitudes and demographics in conjoint analysis.

Food Qual Prefer 21(4):368–378

Næs T, Brockhoff PB, Tomic O (2011) Statistics for sensory and consumer science. Wiley, New York

Parzen MI, Wei LJ, Ying Z (1994) A resampling method based on pivotal estimating functions. Biometrika

81(2):341–350

Romano R, Davino C (2016) Assessing scientiﬁc research activity evaluation models using multivariate

analysis. Stat Interface 9(3):303–313

123

C. Davino et al.

Romano R, Palumbo F (2013) Partial possibilistic regression path modeling for subjective measurement. J

Methodol Appl Stat 15:177–190

Romano R, Brockhoff PB, Hersleth M, Tomic O, Næs T (2008) Correcting for different use of the scale

and the need for further analysis of individual differences in sensory analysis. Food Qual Prefer

19(2):197–209

Romano R, Vestergaard JS, Kompany-Zareh M, Bredie WL (2011) Monitoring panel performance within

and between sensory experiments by multi-way analysis, in Classiﬁcation and Multivariate Analysis

for Complex Data Structures, 335–342. Springer, Berlin

Romano R, Davino C, Næs T (2014) Classiﬁcation trees in consumer studies for combining both product

attributes and consumer preferences with additional consumer characteristics. Food Qual Prefer 33:27–

36

Romano R, Næs T, Brockhoff PB (2015) Combining analysis of variance and three-way factor analysis

methods for studying additive and multiplicative effects in sensory panel data. J Chemom 29(1):29–37

Shaffer JP (1995) Multiple hypothesis testing. Ann Rev Psychol 46:561–584

Smilde AK, Westerhuis JA, Boque R (2000) Multiway multiblock component and covariates regression

models. J Chemom 14(3):301–331

Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic.

J R Stat Soc Ser B (Stat Methodol) 63(2):411–423

Westerhuis JA, Kourti T, MacGregor JF (1998) Analysis of multiblock and hierarchical PCA and PLS

models. J Chemom 12:301–321

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps

and institutional afﬁliations.

123