# Multivariate prototype approach for authentication of food products

**ABSTRACT** Authentication basically consists in deciding if a given unknown product belongs or not to a group of interest, defined by producers or regulators. More often, in order to demonstrate the authentication ability of a given instrumental analysis, several other groups are arbitrarily chosen. Then a Factorial or Linear Discriminant Analysis (FDA or LDA) or a Partial Least Squares Discriminant Analysis (PLS-DA) is usually performed; the model therefore depends on the nature of all observed groups of the study. The aim of this paper was to investigate an approach, named "prototype approach", based on a model built up only using the group of products of interest. Such an approach has the advantage not to depend on the whole complementary data of the study. Prototype approach is inspired by Multivariate Statistical Process Control and Hotelling T 2 statistic and consists in buiding up the assignment model according to the group of interest. Then, authentication step of new data is performed. Prototype approach and FDA were compared on a case study (authentication of Beaujolais red wines using their polyphenolic composition). False negative (#FN) and false positive (#FP) numbers were estimated by bootstrapping procedures for both methods. Compared to FDA, the prototype approach gave higher #FP with larger variability and lower #FN with lower variability. Wines produced with the same grape variety as AOC Beaujolais but in other regions were poorly authenticated. The prototype approach appears to be more flexible than FDA. The user can adjust the theoretical α risk in relation to its strategy, making that decision tool an alternative to discriminant analyses for authentication.

**0**Bookmarks

**·**

**256**Views

- [Show abstract] [Hide abstract]

**ABSTRACT:**Seventy-three Southern Italian red wines were characterized according to their content in total polyphenols, trans- and cis-resveratrol and biogenic amines. These quality parameters were used in multivariate statistical analysis to discriminate the wines according to their specific geographical origin. The results indicated that total polyphenols, resveratrol isomers and biogenic amines provide a good prospect for discriminating wines by regions. The discrimination was also possible to a lesser extent by cultivar. In particular, canonical variate analysis suggested that the discrimination of wines according to their provenance is based on the following parameters: cis-resveratrol, total polyphenols, spermidine and tryptamine for Basilicata region; agmatine and trans-resveratrol for Calabria and Campania regions; cadaverine, ethanolamine, histamine, putrescine and tyramine for Puglia region. KeywordsBiogenic amines–Multivariate analysis–Red wine–Resveratrol–Total polyphenols–Wine authenticationEuropean Food Research and Technology 01/2011; 232(5):889-897. · 1.39 Impact Factor - SourceAvailable from: Carl A AndersonBenoît Igne, Anna de Juan, Joaquim Jaumot, Jordane Lallemand, Sébastien Preys, James K Drennen, Carl A Anderson[Show abstract] [Hide abstract]

**ABSTRACT:**The implementation of a blend monitoring and control method based on a process analytical technology such as near infrared spectroscopy requires the selection and optimization of numerous criteria that will affect the monitoring outputs and expected blend end-point. Using a five component formulation, the present article contrasts the modeling strategies and end-point determination of a traditional quantitative method based on the prediction of the blend parameters employing partial least-squares regression with a qualitative strategy based on principal component analysis and Hotelling's T(2) and residual distance to the model, called Prototype. The possibility to monitor and control blend homogeneity with multivariate curve resolution was also assessed. The implementation of the above methods in the presence of designed experiments (with variation of the amount of active ingredient and excipients) and with normal operating condition samples (nominal concentrations of the active ingredient and excipients) was tested. The impact of criteria used to stop the blends (related to precision and/or accuracy) was assessed. Results demonstrated that while all methods showed similarities in their outputs, some approaches were preferred for decision making. The selectivity of regression based methods was also contrasted with the capacity of qualitative methods to determine the homogeneity of the entire formulation.International journal of pharmaceutics. 07/2014;

Page 1

Multivariate prototype approach for authentication of food products

S. Preysa,⁎, E. Vigneaub, G. Mazerollesa, V. Cheyniera, D. Bertrandb

aUMR Sciences pour l'Oenologie, INRA, 2 Place Viala, 34060 Montpellier, France

bUnité de Sensométrie et de Chimiométrie, ENITIAA/INRA, La Géraudière, BP 82225, 44322 Nantes Cedex, France

Received 28 July 2006; received in revised form 17 January 2007; accepted 22 January 2007

Abstract

Authentication basically consists in deciding if a given unknown product belongs or not to a group of interest, defined by producers or

regulators. More often, in order to demonstrate the authentication ability of a given instrumental analysis, several other groups are arbitrarily

chosen. Then a Factorial or Linear Discriminant Analysis (FDA or LDA) or a Partial Least Squares Discriminant Analysis (PLS-DA) is usually

performed; the model therefore depends on the nature of all observed groups of the study. The aim of this paper was to investigate an approach,

named “prototype approach”, based on a model built up only using the group of products of interest. Such an approach has the advantage not to

depend on the whole complementary data of the study.

Prototype approach is inspired by Multivariate Statistical Process Control and Hotelling T2statistic and consists in buiding up the assignment

model according to the group of interest. Then, authentication step of new data is performed. Prototype approach and FDA were compared on a

case study (authentication of Beaujolais red wines using their polyphenolic composition). False negative (#FN) and false positive (#FP) numbers

were estimated by bootstrapping procedures for both methods.

Compared to FDA, the prototype approach gave higher #FP with larger variability and lower #FN with lower variability. Wines produced with

the same grape variety as AOC Beaujolais but in other regions were poorly authenticated. The prototype approach appears to be more flexible than

FDA. The user can adjust the theoretical α risk in relation to its strategy, making that decision tool an alternative to discriminant analyses for

authentication.

© 2007 Elsevier B.V. All rights reserved.

Keywords: Authentication;MultivariateStatisticalProcessControl(MSPC);Prototype;Factorial/LinearDiscriminantAnalysis(FDA/LDA);Food;Wine;Polyphenols

1. Introduction

Authentication is the ability to assign an unknown product to

a known class of products by means of its physico-chemical or

even sensorial characterization and a learning model. In the

food industry, the authentication of products is an important

need in the scope of traceability, food safety and quality control

[1,2]. Authentication tools can also be used for marketing

purposes, especially in order to build commercial brands

including very well differentiated products for consumers. In

this scope, authenticating quality marks such as ‘AOC’

(Appellation d'Origine Contrôlée=Protected Denomination of

Origin) are of prime interest. This is more and more often

achieved by characterizing products in a multivariate way,

rather than analyzing independently one or few markers [3].

Many studies have dealt with differentiation or authentica-

tion of food products such as wines. The wines were

differentiated in relation to their variety using markers such as

volatile compounds [4,5], to their vintage by analyzing stable

isotopes of minerals [6], to their geographical origin by means

of trace element measurements [7,8,9], or to the wine-making

process by analyzing amino-acids [10,11]. Some authors

explored the discriminative potential of some polyphenolic

compounds, which are secondary metabolites of the grape berry

mainly responsible for wine color and astringency. Anthocyanin

composition was used to differentiate red wines made from

different grape varieties in various regions [12–15]. Some other

polyphenolic compounds, i.e. flavonols [16,17] or phenolic

acids [17,18], were analyzed to discriminate wines from various

varieties, regions and technologies.

Chemometrics and Intelligent Laboratory Systems xx (2007) xxx–xxx

+ MODEL

CHEMOM-01884; No of Pages 8

www.elsevier.com/locate/chemolab

⁎Corresponding author. Fax: +33 4 99 61 26 83.

E-mail address: spreys@ondalys.fr (S. Preys).

0169-7439/$ - see front matter © 2007 Elsevier B.V. All rights reserved.

doi:10.1016/j.chemolab.2007.01.003

ARTICLE IN PRESS

Please cite this article as: S. Preys et al., Multivariate prototype approach for authentication of food products, Chemometrics and Intelligent Laboratory Systems

(2007), doi:10.1016/j.chemolab.2007.01.003

Page 2

Discriminant analyses are commonly carried out in such

studies. Most of the time, the authors made use of FDA or LDA

(Factorial or Linear Discriminant Analysis) [5,6,8,10,15,18],

and more rarely SIMCA (Soft Independent Modelling of Class

Analogy) [7]. More recently, PLS-DA (Partial Least Squares

Discriminant Analysis) [19,20,21] appeared to be an interesting

tool. Non-parametric methods, e.g. k-NN (k-Nearest Neighbors)

[7,22], and neural networks [22] were also used.

In many situations, the very purpose of authentication

studies is to separate a single “group of interest”, from other

groups. When using discriminant analyses, it is thus necessary

to build up groups of observations representing the group of

interest and also complementary groups, including products,

which do not belong to the group of interest. As it is almost

impossible to study all the existing groups of products, the

resulting model thus depends on the nature of these comple-

mentary groups. Moreover, if a group has a particular

importance, it seems reasonable to make principal use of it in

building up the model.

The objectives of this work were (i) to investigate an

authentication approach, named the “prototype approach”,

where only the knowledge of the group of interest, called the

“reference group”, is used to build up a set of decision rules; and

(ii) to compare this prototype approach to FDA, which is

usually used in authentication problems. The performances of

the two methods will be discussed on an illustrative example,

dealing with the authentication of AOC commercial red wines

using their polyphenolic composition.

2. Statistical methods

2.1. Prototype approach

The prototype approach only requires that the reference

group has been well defined previously. The presented

methodology was inspired by Multivariate Statistical Process

Control (MSPC) [23–26]. The main difference between the

proposed prototype approach and MSPC methodology lies in

the fact that the notion of time-series in MSPC, with

observations repeated at every time point of a continuous

process, is no longer appropriate in authentication studies.

However, the rationale of the method is the same: once having

defined a model giving a description of the reference products,

new observations are considered and assessed to be compatible,

or not, with the reference.

In MSPC, when p multinormal variables are measured on

each observation, discrepancy from the in-control or reference

situation is evaluated by using the Hotelling T2statistic [27]:

T2¼ x−m

^

??VS

^−1x−m

^

??

ð1Þ

where x is the (p×1) measurement vector for one particular

observation (or a sample of size one) and μ^and Σ^are

respectively the mean vector and variance–covariance matrix

estimated under the in-control situation. This T2statistic is

actually the squared Mahalanobis distance between each

multidimensional observation and the centroid of all observa-

tions involved in the estimation of the parameters μ and Σ

[28]. The training set of n observations, used for estimating

μ and Σ, is supposed to be representative of the reference

situation.

In rather common cases, the number p of measured variables

is large, and may present a high level of colinearity. This results

in a variance–covariance matrix Σ that is nearly singular. A

procedure for reducing the dimensionality of the variable space

is to use Principal Components (PC) or PLS components [26]. If

we consider the PC tk, processed after a Principal Component

Analysis (PCA), organized in decreasing order of their variance

λk(for k=1,…, min (n−1, p)), the T2statistic can be expressed

as:

T2¼

X

k¼1

min n−1;p

ðÞt2

kk

k

¼ T2

Aþf

T2

ð2Þ

where

T2

A¼

X

A

k¼1

t2

kk

k

ð3Þ

TA

residual value, representing the deviation from the PCA model.

From Eq. (2), it clearly appears that the last PC, associated

with the smallest eigenvalues λ, can play a main role in the

statistic value. Thus, an alternative is to consider only the A first

PC and to make use of the statistic TA

the TA

which are not expressed in the principal space formed by the A

first PC of the training set, another statistic is also considered.

This statistic, denoted RSPE (Root Squared Prediction Error), is

the square root of the variance of the residuals, obtained after

projection into the principal space.

For each of these statistics, a decision rule is built up, the null

hypothesis being associated with the reference situation. For

authentication purpose, the parameters of the model under the

null hypothesis are estimated on the basis of a training set of

reference observations. The critical values of the tests, named

Upper Control Limit (UCL) in MSPC, are defined [24,29] as

follows:

2is the T2value estimated from the A first PC. T~2is thus a

2(Eq. (3)). Nevertheless, as

2values do not take into account possible new phenomena,

UCLT2

A¼A n−1

ðÞ n þ 1

n n−A

ð

ðÞ

Þ

F1−a;A;n−A

ð4Þ

UCLRSPE¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

m

2n

??

v2

1−a;2g2

m

r

ð5Þ

where n is the size of the training reference set and A the

number of PC retained. η and ν are respectively the mean and

the variance of the SPE (Squared Prediction Error), i.e. the

variance of residuals, obtained for the training set. F and χ2

hold for Fisher and Chi-squared distributions, and α is the

chosen significance level.

2S. Preys et al. / Chemometrics and Intelligent Laboratory Systems xx (2007) xxx–xxx

ARTICLE IN PRESS

Please cite this article as: S. Preys et al., Multivariate prototype approach for authentication of food products, Chemometrics and Intelligent Laboratory Systems

(2007), doi:10.1016/j.chemolab.2007.01.003

Page 3

The whole methodology of the prototype approach includes

different stages.

2.1.1. Model construction

2.1.1.1. Choice of the number of significant PC.

performed a PCA on the training reference set, a permutation

test may be used as a tool for choosing the A significant PC

[24,30]. This approach is aimed at identifying PC whose

variance is higher than the variance of noise. Nevertheless, we

instead suggest to confirm this selection according to

interpretability criterion, e.g. chemical meanings. The first PC

representing the main interpretable underlying phenomena in

the data structure (e.g. color) are then chosen.

After having

2.1.1.2. ‘Cleaning’ and analysis of the reference training

set. This stage consists in analyzing the reference data usingTA

and RSPE tests in order to define an homogeneous group of

reference fromthe observationsathand. Itshouldbenoticed that,

atthisstage,thecriticalvalueUCLofthetestbasedonT2statistic

dependsonthebetadistributionratherthantheFisherdistribution

[29]. T2values and the estimations of μ and Σ are actually

evaluated using the same data. If some observations are rejected

forone,atleast,ofbothtests,theyareexcludedfromthereference

set and the first step of the procedure is iterated. It should be

noticedthatusualestimatorsofparametersμandΣareknownto

lack robustness, i.e. to be sensitive to the presence of several

outliers. Vargas compared four methods for calculating more

robust estimators [31]. The MVE (Minimum Volume Ellipsoid)

method,developedbyRousseeuw etal.[32,33],seemedtobethe

mostefficient,whateveristhenumberofoutliersinthedataset.It

is based on a clustering algorithm that determines the smallest

volume containing at least 50% of the samples. This method was

actually used in this stage of our prototype approach.

2

2.1.2. Authentication (analysis of new data)

This step corresponds to the final objective of the procedure

and consistsin analyzing any new observation for authentication.

IftheTA

UCLTA2 and UCLRSPE, respectively, (Eqs. (4) and (5)), then this

observationisdeclaredtobecompatiblewiththereferencegroup.

2and RSPE values for the new observation are lower than

2.2. Factorial Discriminant Analysis (FDA)

When using discriminant analyses for authentication pur-

pose, one needs to describe the reference group but also the

complementary groups. The user has thus two, or more, groups

of data in his collection. In order to decide if a new observation

belongs or not to the group of reference, FDA, or LDA, are most

commonly used. Their classification rule is defined according to

the minimum of the Mahalanobis distances computed between

the new observation and the centroid of each group.

Comparatively, the T2statistic of the prototype approach

depends only on the Mahalanobis distance of a new observation

to the reference group. Moreover, FDA is based on the

hypothesis that the variance–covariance matrices of the

different groups are the same, leading to estimate a pooled

variance–covariance matrix. But this hypothesis is not

necessarily true, especially for authentication analysis, where

the reference group is quite homogeneous relatively to the

universe of the other observations. Such an hypothesis is not

required for the prototype approach. Finally, when the variables

are highly colinear, it is recommended, as previously, to define

the classification rule by considering only a subset of PC.

2.3. Criteria for the comparison of prototype approach and

FDA

In order to compare the performances of prototype approach

and FDA for authentication, various criteria were considered.

The first one was the number (or percentage when specified) of

false negative (#FN), i.e. the number of observations really

belonging to reference group that have been rejected. Besides,

we considered the number (or percentage when specified) of

false positive (#FP), which is the number of outsiders wrongly

allocated to the reference group.

The difficulty encountered with prototype approach was to

correctly estimate #FN. As a matter of fact, each observation of

the reference group was submitted to the decision rule, but each

of them was also used for the estimation of the parameters for

the reference model. The risk was then to underestimate the

false alarm rate according to T2statistic. For this reason, a

Leave-One-Out (LOO) procedure was performed. At each of

the n steps of the procedure, an observation of the group of

reference was discarded, and the model was constructed on the

basis of the (n−1) other observations. The left out observation

was submitted to the decision rule of the current step. This

procedure was strongly advocated by Ramaker et al. [34], who

showed that the LOO procedure for building the control charts

partly solved the problem of the false alarm rate estimation,

especially when the size of the training set was small. LOO

procedure was also performed to evaluate #FN and #FP with

Table 1

Structure of the sample set according to variety, region, vintage and year of analysis

VarietyRegion Year analysisVintage Total

1999 20002001 20022003?

Gamay (G) Beaujolais (GB)2003

2004

2003

2004

2003

2004

63

6

2

9

7

3

3

2

2

38Reference group

61

Other (GO)1 12Outsider 1 group

3

Other (O)Other (OO)111

1

10Outsider 2 group

2

3S. Preys et al. / Chemometrics and Intelligent Laboratory Systems xx (2007) xxx–xxx

ARTICLE IN PRESS

Please cite this article as: S. Preys et al., Multivariate prototype approach for authentication of food products, Chemometrics and Intelligent Laboratory Systems

(2007), doi:10.1016/j.chemolab.2007.01.003

Page 4

FDA. In this case, LOO was carried out (n+m) times (with n the

number of observations in reference group and m the number of

observations in other groups, i.e. the number of outsiders).

Moreover, it is clear that the obtained estimations #FN and

#FP intrinsically depend on the available observations. A

manner of assessing the variability of these results according to

sampling variability was to apply a procedure of resampling,

called bootstrap [35,36]. This approach consists in selecting

during n steps, at random and with replacement, n observations

among the n observations of the initial reference set, and to

proceed similarly for each of the other groups. By performing

this resampling scheme many times (B=1000 times for

instance), it is possible to obtain a bootstrap estimator of the

standard error of the parameters of interest. The same bootstrap

data sets were submitted to both prototype approach and FDA.

The previously described LOO procedure was actually

performed on each bootstrap sample for both methods,

prototype approach and FDA.

All programs were developed using Matlab software

(version 6.5, The Mathworks, Inc., MA).

3. Case study

The illustrative application concerns the authentication of

AOC red wines from the Beaujolais region in France using their

polyphenolic composition.

3.1. Materials and analytical methods

3.1.1. Wine samples

The whole data basis included sixty French commercial red

wines: 38 of them were AOC from the Beaujolais region

produced with Gamay grape variety. They defined the reference

group. Twelve Gamay from other regions were assigned to a

group called “outsiders 1”, and 10 wines produced from other

grape varieties in other regions were called “outsiders 2”. Wines

corresponded to vintages between 1999 and 2003, and were

analyzed in 2003 or 2004. The structure of the data set according

to variety, region, vintage and year analysis is shown in Table 1.

In the following, a codification for wine samples using five

character codes is adopted:

• first character codes for variety (G: Gamay, O: others),

• second character codes for region (B: Beaujolais, O: others),

• third and fourth characters form arbitrary numbers for wine

samples (from 1 to 30 for year 2003 analysis, and from 31 to

62 for year 2004 analysis),

• fifth character codes for vintage (9: 1999, 0: 2000, 1: 2001, 2:

2002, 3: 2003).

3.1.2. Analysis of polyphenolic composition

The polyphenolic analysis of these wines is the result of a

previous study [37]. A total of 30 phenolic compounds,

including a few unknown, were analyzed using a HPLC–

DAD–MS (High Performance Liquid Chromatography–Diode

Array Detection–Mass Spectrometry) method. Namely 4

phenolic acids, 7 flavonols, and 19 native and derived red

pigments were quantified. Tannins (proanthocyanidins), i.e. the

total quantity and the quantities of each of constitutive unit as

well, were analyzed using HPLC–DAD after thiolysis of a

methanol extract [38]. All variable codes are detailed in Table 2.

Data was row-wise normalized, in order to obtain the

composition profile of each wine (each compound quantity in

relation to the total amount of polyphenols).

3.2. Results and discussion

3.2.1. Model construction using the prototype approach

A non-standardized PCA using the 37 polyphenolic variables was

performed with the observations of the reference group as active data

Table 2

Codes of polyphenolic variables

Variable code Phenolic compound

Phenolic acids

%Agall

%Acaft

%Acout

%Acaff

Gallic acid

t-caftaric acid

t-coutaric acid

Caffeic acid

Flavonols

%Fmgco

%Fmglc

%F555

%F493

%Fm

%F507

%Fq

Myricetin-3-O-glucuronide

Myricetin-3-O-glucoside

Unknown compound with m/z=555

Unknown compound with m/z=493

Myricetin

Unknown compound with m/z=507

Quercetin

Red pigments

%P797

%Pmvcat

%Pdp

%Pcy

%Ppt

%Ppn

%Pmv

%PvitA

%Pptac

Unknown compound with m/z=797

Malvidin-3-O-glucoside-catechin

Delphinidin-3-O-glucoside

Cyanidin-3-O-glucoside

Petunidin-3-O-glucoside

Peonidin-3-O-glucoside

Malvidin-3-O-glucoside

Vitisin A+delphinidin-3-O-acetylglucoside

Petunidin-3-O-acetylglucoside+catechin-ethyl-malvidin-3-O-

glucoside

Peonidin-3-O-acetylglucoside+malvidin-3-O-p-

coumarylglucoside-pyruvic acid

Malvidin-3-O-acetylglucoside+delphinidin-3-O-p-

coumarylglucoside

Cyanidin-3-O-p-coumarylglucoside

Petunidin-3-O-p-coumarylglucoside

Peonidin-3-O-p-coumarylglucoside

Malvidin-3-O-p-coumarylglucoside

Unknown compound with m/z=609

Unknown compound 1

Unknown compound 2

Unseparated pigment derivatives

%Ppnac

%Pmvac

%Pcycou

%Pptcou

%Ppncou

%Pmvcou

%P609

%P?1

%P?2

%Punsep

Tannins

%Ttot

%Tcat-low

%Tepi-low

%TEGC

%Tcat-up

%Tepi-up

%Tgall

Total tannins

Catechin in lower position

Epicatechin in lower position

Epigallocatechin in upper position

Catechin in upper position

Epicatechin in upper position

Epicatechin 3-gallate in upper position

4S. Preys et al. / Chemometrics and Intelligent Laboratory Systems xx (2007) xxx–xxx

ARTICLE IN PRESS

Please cite this article as: S. Preys et al., Multivariate prototype approach for authentication of food products, Chemometrics and Intelligent Laboratory Systems

(2007), doi:10.1016/j.chemolab.2007.01.003

Page 5

and the observations of outsider 1 and outsider 2 groups as illustrative

data. Fig. 1 shows the score (a) and the loading (b) plots for the two first

PC. The four first PC, which reflected 97.6% of the total variance, were

clearly interpretable and thus considered as significant. The polyphe-

nolic variables that best characterized AOC Beaujolais were the

percentages of caftaric acid (%Acaft), malvidin-3-O-glucoside (%Pmv),

epicatechin in upper position (%Tepi-up), epicatechin (%Tepi-low) and

catechin (%Tcat-low) in lower positions on the two first PC (Fig. 1),

tannins (%Ttot) and unseparated pigments (%Punsep) on the PC3 and

PC4 (data not shown).

According to TA

on the observations of the reference group (Fig. 2), no reference wine

was discarded (all the observed statistics values were lower than the

UCL). Using the MVE method for robust estimations of parameters μ

and Σ confirmed the absence of evident outliers (data not shown). The

reference training set including the 38 AOC Beaujolais was thus

considered to be homogeneous.

2and RSPE tests, with a α risk set at 1%, performed

3.2.2. Authentication using the prototype approach

Fig. 3 shows the TA

complementary data sets, i.e. the outsider 1 and outsider 2 groups. As a

wine is declared similar to an AOC Beaujolais if both its TA

2and RSPE results of the wines belonging to the

2and RSPE

statistics are lower than the UCL levels, nine observations of the

outsider 1 group (GO061, GO071, GO120, GO222, GO362, GO372,

GO393, GO422, GO523) and two of the outsider 2 group (OO081,

OO382) were allocated to the reference group. The observed numbers

of false positive were then, respectively for the two complementary

data sets, #FP1=9 (75%) and #FP2=2 (20%) (Table 3(a)).

The number of false positive, #FP1, for Gamay wines from other

regions (outsider 1 group) appears to be high. Four of them (GO071,

GO222,GO372andGO523)wereactuallymadeinregionsthatareclose

tothe Beaujolaisarea(Côteaux du Lyonnaisand Rhône valley) usingthe

same technology as AOC Beaujolais (carbonic maceration with whole

uncrushed grapes). All other wines (the other wines of the outsider 1

groupandallthewinesoftheoutsider2group)wereprocessedbyclassic

skin maceration. Moreover, it seems that the grape variety (Gamay) is an

importantfactorexplainingtheproximityofoutsider1winestoreference

wines, according to their polyphenolic profiles.

The number of false positive, #FP2, for wines from other grape

varieties and other regions (outsider 2 group) was much lower. The

non-rejected wines were OO081 and OO382, which were Syrah wines

produced in a close region (Rhône valley).

The number of false negative was assessed by performing a LOO

procedureonthebasisofthereferencedataset.Thisledtoanestimation

of #FN=1 (Table 3(a)). As a matter of fact, the observation GB232 was

Fig. 1. PCA of the reference wines on the two first PC. Outsider wines are

plotted as illustrative observations. (a) Score plot. Reference wines are in black

and outsider wines in gray characters. (b) Loading plot.

Fig. 2. (a) TA

2and (b) RSPE tests with a α risk=1% on reference observations.

5S. Preys et al. / Chemometrics and Intelligent Laboratory Systems xx (2007) xxx–xxx

ARTICLE IN PRESS

Please cite this article as: S. Preys et al., Multivariate prototype approach for authentication of food products, Chemometrics and Intelligent Laboratory Systems

(2007), doi:10.1016/j.chemolab.2007.01.003

Page 6

rejected on the basis of its RSPE criterion, with a value of 0.212 higher

than UCLα=1%equal to 0.206.

The estimations of #FN, #FP1 and #FP2 given above were obtained

for a particular selection of wines of different categories, corresponding

to the initial data set. In order to estimate the sampling variability of

these point estimations, bootstrapping procedures were used. Table 3(a)

gives the mean (mB) and the standard error (sB) evaluated after

B=1000 bootstrap estimations of the criteria for three α risk levels (1, 5

and 10%). The following observations can be drawn from this table.

First, one can verify, for α=1%, that the estimations of false error rates

obtained with the initial data set and the mean of the bootstrap

estimations are very close. Secondly, the value and variability of #FP1

and #FP2 estimated by the prototype approach are quite important.

Finally, the choice of α risk level allows to modulate and to find a

compromise between #FN and #FP. But the wines produced with the

same variety as AOC Beaujolais in other regions were, in this study,

often assigned to the AOC Beaujolais group. By comparison, the

authentication performance using high informative data, e.g. descrip-

tive sensory scores (including visual, olfactory and gustatory

attributes), instead of polyphenolic profiles, was very similar (data

not shown herein). This emphasizes the good potential of polyphenolic

information to differentiate wines.

3.2.3. Comparison with FDA

FDA was also performed in order to determine the group

membership of wines according to their polyphenolic profiles. The

construction of the discriminant space was achieved with the whole set

of data, i.e. the 60 wines, characterized by the same three a prirori

groups. A subset of PC was considered again. We selected the 4 first

PC considered as significant. These PC were submitted to a stepwise

procedure (in decreasing order of discriminant ability) as follows: PC2,

PC3, PC4 and PC1.

Each of the performance criterion was estimated using a LOO

procedure. Namely, each observation iteratively discarded from the

reference group was involved in the evaluation of #FN. Similarly, each

observation belonging to outsider 1 group and outsider 2 group, once

excluded of the data set, was used to estimate #FP1 and #FP2

respectively. The results were #FN=7 (18%), #FP1=5 (42%) and

#FP2=0 (Table 3(b)).

Thebootstrapselection,whichwasthesameastheoneappliedforthe

prototype approach, led to estimations that, on average, were similar to

those obtained with the initial data (Table 3(b)). It appears that the wine

samples of outsider 2 group were never assimilated to AOC Beaujolais,

whereasthenumberoferrorsmadebyaffectingasampleoftheoutsider1

group to the reference group was much lower than with the prototype

approach. In return, the number of AOC Beaujolais incorrectly assigned

by FDAwas much greater than with the prototype approach.

Performances of the two approaches were thus very different. The

choice of the method depends on the question and the strategy which

are considered. If the objective of the user is to avoid selecting outsider

observations, even if it may lead to rejecting many observations of

Fig. 3. (a) TA

(in gray) and outsider 2 (in black) groups. SampleOO443 doesnot appearon the

charts, with TA

2and (b) RSPE tests with a α risk=1% on observations of outsider 1

2value of 285 and RSPE of 0.60.

Table 3a

Mean (mB) and standard deviation (sB) of 1000 bootstrap estimations of the number of false negative (#FN) and the numbers of false positive (#FP1 and #FP2)

according to the prototype approach method for three α risk levels

n Initial data setBootstrapping procedures

α=1%α=1%α=5%α=10%

#FN

#FP1

#FP2

38

12

10

1

9

2

mB=0.90

mB=8.49

mB=2.27

sB=0.78

sB=1.87

sB=1.62

mB=4.13

mB=6.81

mB=1.40

sB=1.41

sB=2.04

sB=1.39

mB=7.21

mB=5.81

mB=0.87

sB=1.91

sB=1.99

sB=1.20

Table 3b

Mean (mB) and standard deviation (sB) of 1000 bootstrap estimations of the

number of false negative (#FN) and the numbers of false positive (#FP1 and

#FP2) according to the FDA method

nInitial data set Bootstrapping procedures

#FN

#FP1

#FP2

38

12

10

7

5

0

mB=8.01

mB=3.80

mB=0.00

sB=3.07

sB=1.54

sB=0.00

6 S. Preys et al. / Chemometrics and Intelligent Laboratory Systems xx (2007) xxx–xxx

ARTICLE IN PRESS

Please cite this article as: S. Preys et al., Multivariate prototype approach for authentication of food products, Chemometrics and Intelligent Laboratory Systems

(2007), doi:10.1016/j.chemolab.2007.01.003

Page 7

interest (reference group), then FDA is more efficient in our study. On

the contrary, if the objective is to avoid rejecting reference

observations, then the prototype approach is more adapted. Both

approaches differed in their balance of error risks (type I and type II

errors). An advantage of prototype approach lies in its flexibility.

Actually, the level of theoretical α risk can be seen as a “tuning

parameter”, as shown in Table 3(a). It should be chosen small to lower

#FN, and should be set higher to lower #FP. In our study, the closest

performances concerning #FN between prototype approach and FDA

were obtained when α was set around 10% (Tables 3(a) and (b)).

4. Conclusion

An original tool, alternative to FDA or LDA, was investigated

for authentication. It is based on a model only depending on the

reference group and, then, is named the prototype approach. This

approach can be considered as an adaptation of Multivariate

StatisticalControlProcessbyusingHotellingandRSPEstatistics.

It basically involves the reduction of the dimensionality of the

data space, the evaluation of the Mahalanobis distance from this

reduced space and that of residual variability as well. An

application to authentication of French AOC Beaujolais wines

using polyphenolic information showed the difficulty to distin-

guish wines according to their geographical origin. Nevertheless,

this methodology made it possible to emphasize the different

directions of variability due to some identified factors (grape

variety or wine-making technology).

One pointed advantage of the prototype approach, compared

to Discriminant Analysis, lies in the possibility to set the

theoretical α risk in relation to the practical question and the

strategy of the user. The resulting false negative and false

positive numbers can thus be adjusted as a compromise. The

prototype approach is thus versatile and can be an interesting

alternative to classic discriminant analyses as an operational

decision tool for authentication purposes.

Acknowledgement

The authors thank EU-Commission (TYPIC QLK1-CT-2002-

02225)forthefinancialsupport.Itdoesnotreflectitsviewsandin

no way anticipates the Commission's future policy in this area.

References

[1] P.R. Ashurst, M.J. Dennis, Food Authentication, Blackie Academic and

Professional, London, 1995.

[2] M. Lees, Food Authenticity and Traceability, Eurofins Scientific, Nantes,

France, 2000.

[3] D. Bertrand, E. Dufour, La spectroscopie infrarouge et ses applications

analytiques, Lavoisier, Paris, 2000.

[4] V. Ferreira, P. Fernandez, C. Pena, A. Escudero, F.J. Cacho, Investigation

on the role played by fermentation esters in the aroma of young Spanish

wines by multivariate analysis, Journal of the Science of Food and

Agriculture 67 (1995) 381–392.

[5] J.L.Aleixandre,V.Lizama,I.Alvarez,M.J. Garcia,Varietal differenciation

of red wines in the Valancian region (Spain), Journal of Agricultural and

Food Chemistry 50 (2002) 751–755.

[6] M.P. Day, B.L. Zhang, G.J. Martin, The use of trace element data to

complement stable isotope methods in the characterization of grape musts,

American Journal of Enology and Viticulture 45 (1994) 79–85.

[7] M.J. Latorre, C. Garcia-Jares, B. Mèdina, C. Herrero, Pattern recognition

analysis applied to classification of wines from Galicia (Northwestern

Spain) with certified brand of origin, Journal of Agricultural and Food

Chemistry 42 (1994) 1451–1455.

[8] G.J. Martin, M. Mazure, C. Jouitteau, Y.L. Martin, L. Aguile, P. Allain,

Characterization of the geographic origin of Bordeaux wines by a

combined use of isotopic and trace element measurements, American

Journal of Enology and Viticulture 50 (1999) 409–417.

[9] J. Sperkova, M. Suchanek, Multivariate classification of wines from

different Bohemian regions (Czech Republic), Food Chemistry 93 (2005)

659–663.

[10] P. Etiévant, P. Schlich, J.C. Bouvier, P. Symonds, A. Bertrand, Varietal and

geographic classification of French red wines in terms of elements, amino

acids and aromatic alcohols, Journal of the Science of Food and

Agriculture 45 (1988) 25–41.

[11] I.S. Arvanitoyannis, M.N. Katsota, E.P. Psarra, E.H. Soufleros, S.

Kallithraka, Application of quality control methods for assessing wine

authenticity: use of multivariate analysis (chemometrics), Trends in Food

Science and Technology 10 (1999) 321–336.

[12] B. Berente, D.D.L. Garcia, M. Reichenbacher, K. Danzer, Method

development for the determination of anthocyanins in red wines by

high-performance liquid chromatography and classification of German red

wines by means of multivariate statistical methods, Journal of Chroma-

tography A 871 (2000) 95–103.

[13] E. García-Beneytez, F. Cabello, E. Revilla, Analysis of grape and wine

anthocyanins by HPLC–MS, Journal of Agricultural and Food Chemistry

51 (2003) 5622–5629.

[14] P. Etiévant, P. Schlich, A. Bertrand, P. Symonds, J.-C. Bouvier, Varietal

and geographical classification of French red wines in term of pigments

and flavonoid compounds, Journal of the Science of Food and Agriculture

42 (1988) 39–54.

[15] I. Arozarena, A. Casp, R. Marin, M. Navarro, Differentiation of some

Spanish wines according to variety and region based on their anthocyanin

composition, European Food Research and Technology 212 (2000)

108–112.

[16] S.F. Price, P.J. Breen, M. Vallado, B.T. Watson, Cluster sun exposure and

quercetin in pinot noir grapes and wine, American Journal of Enology and

Viticulture 46 (1995) 187–194.

[17] G.J. Soleas, J. Dam, M. Carey, D.M. Goldberg, Toward the fingerprinting

of wines: cultivar-related patterns of polyphenolic constituents in Ontario

wines, Journal of Agricultural and Food Chemistry 45 (1997) 3871–3880.

[18] A. de Villiers, P. Majek, F. Lynen, A. Crouch, H. Lauer, P. Sandra,

Classification of South African red and white wines according to grape

variety based on the non-coloured phenolic content, European Food

Research and Technology 221 (2005) 520–528.

[19] S. Roussel, V. Bellon-Maurel, J. Roger, P. Grenier, Authenticating white

grape must variety with classification models based on aroma sensors, FT-

IR and UV spectrometry, Journal of Food Engineering 60 (2003) 407–419.

[20] S. Masoum, D.J.R. Bouveresse, J. Vercauteren, M. Jalali-Heravi, D.N.

Rutledge, Discrimination of wines based on 2D NMR spectra using

learning vector quantization neural networks and partial least squares

discriminant analysis, Analytica Chimica Acta 558 (2006) 144–149.

[21] L. Liu, D. Cozzolino, W.U. Cynkar, M. Gishen, C.B. Colby, Geographic

classification of Spanish and Australian tempranillo red wines by visible

and near-infrared spectroscopy combined with multivariate analysis,

Journal of Agricultural and Food Chemistry 54 (2006) 6754–6759.

[22] R.M. Alonso-Salces, S. Guyot,C. Herrero, L.A. Berrueta,J.-F. Drilleau, B.

Gallo, F. Vicente, Chemometric classification of Basque and French ciders

based on their total polyphenol contents and CIELab parameters, Food

Chemistry 91 (2005) 91–98.

[23] C.Wikström,C. Albano,L.Eriksson,H.Friden,E.Johansson,A.Nordahl,

S. Rännar, M. Sandberg, N. Kettaneh-Wold, S. Wold, Multivariate process

and quality monitoring applied to an electrolysis process. Part I: process

supervision with multivariate control charts, Chemometrics and Intelligent

Laboratory Systems 42 (1998) 221–231.

[24] A. Nijhuis, S. de Jong, B.G.M. Vandeginste, Multivariate statistical

process control in chromatography, Chemometrics and Intelligent

Laboratory Systems 38 (1997) 51–62.

7 S. Preys et al. / Chemometrics and Intelligent Laboratory Systems xx (2007) xxx–xxx

ARTICLE IN PRESS

Please cite this article as: S. Preys et al., Multivariate prototype approach for authentication of food products, Chemometrics and Intelligent Laboratory Systems

(2007), doi:10.1016/j.chemolab.2007.01.003

Page 8

[25] D.C. Montgomery, Introduction to Statistical Quality Control, John Wiley

and Sons Inc., New York, 1997, pp. 360–373.

[26] T. Kourti, J.F. MacGregor, Tutorial: process analysis, monitoring and

diagnosis, using multivariate projection methods, Chemometrics and

Intelligent Laboratory Systems 28 (1995) 3–21.

[27] H. Hotelling, in: C. Eisenhart, M.W. Hastay, W.A. Wallis (Eds.),

Techniques of Statistical Analysis, McGraw-Hill, New York, 1947,

pp. 113–184.

[28] R. de Maesschalck, D. Jouan-Rimbaud, D.L. Massart, Tutorial: the

Mahalanobis distance, Chemometrics and Intelligent Laboratory Systems

50 (2000) 1–18.

[29] N.D. Tracy, J.C. Young, R.I. Mason, Multivariate control charts for

individual observations, Journal of Quality Technology 24 (1992)

88–95.

[30] G.B. Dijksterhuis, W.J. Heiser, The role of permutation tests in explor-

atory multivariate data analysis, Food Quality and Preference 6 (1995)

263–270.

[31] J.A. Vargas N, Robust estimation in multivariate control charts for indi-

vidual observations, Journal of Quality Technology 35 (2003) 367–376.

[32] P.J. Rousseeuw, Least median of squares regression, Journal of the

American Statistical Association 79 (1984) 871–880.

[33] P.J. Rousseeuw, K. van Driessen, A fast algorithm for the minimum

covariance determinant estimator, Technometrics 41 (1999) 212–223.

[34] H.J. Ramaker, E.N.M. Van Sprang, A. Westerhuis, A.K. Smilde, The effect

of the size of the training set and the number of principal components on

the false alarm rate in statistical process monitoring, Chemometrics and

Intelligent Laboratory Systems 73 (2004) 181–187.

[35] R. Wehrens, H. Putter, L.M.C. Buydens, The bootstrap: a tutorial,

Chemometrics and Intelligent Laboratory Systems 54 (2000) 35–42.

[36] B. Efron, Bootstrap methods: another look at the jackknife, Annals of

Statistics 7 (1979) 1–26.

[37] S. Preys, G. Mazerolles, P. Courcoux, A. Samson, U. Fischer, M. Hanafi,

D. Bertrand, V. Cheynier, Relationship between polyphenolic composition

and some sensory properties in red wines using multiway analyses,

Analytica Chimica Acta 563 (2006) 126–136.

[38] S. Preys, J.M. Souquet, E. Meudec, C. Morel-Salmi, V. Cheynier, in:

O. A.a.S. Hoikkala (Ed.), XXII International Conference on Poly-

phenols, Helsinki, Finland, 2004, pp. 673–674.

8 S. Preys et al. / Chemometrics and Intelligent Laboratory Systems xx (2007) xxx–xxx

ARTICLE IN PRESS

Please cite this article as: S. Preys et al., Multivariate prototype approach for authentication of food products, Chemometrics and Intelligent Laboratory Systems

(2007), doi:10.1016/j.chemolab.2007.01.003