Page 1

Application of Chemometrics in

Separation Science

A. C. Duarte

Departamento de Quı ´mica, Universidade de Aveiro, Aveiro, Portugal

S. Capelo

Departamento de Ecologia, Universidade de E´vora, E´vora, Portugal

Abstract: Chemometrics aims at extracting maximum information through the appli-

cation of statistics and mathematics to problems of a chemical nature. Over the years,

chemometrics has become an important chemical discipline with a significant impact in

analytical chemistry, including the incorporation of significant improvements in design

and selection of optimal experimental procedures, calibration of analytical instrumen-

tation, and advanced methods for analysis of chemical data.

The application of chemometrics methods to separation science, mainly chromato-

graphy and capillary electrophoresis, has followed the same increasing trend as in

any other field of analytical chemistry. However, reviews on the application of

chemometrics in separation science have been very scarce. Therefore, in this paper,

the development of chemometrics in chromatography and capillary electrophoresis

will be presented with a view of the current state of the-art and with the prospects

for the future.

Keywords: Chemometrics, Chromatography, HPLC, Capillary electrophoresis,

Principal component analysis

INTRODUCTION

The major areas of chemometrics chosen to be reviewed in a recent paper[1]

about the evolution of chemometrics related to analytical chemistry,

included multivariate calibration, pattern recognition, and mathematical

Address correspondence to A. C. Duarte, Departamento de Quı ´mica, Universidade

de Aveiro, 3810-193 Aveiro, Portugal. E-mail: aduarte@dq.ua.pt

Journal of Liquid Chromatography & Related Technologiesw, 29: 1143–1176, 2006

Copyright # Taylor & Francis Group, LLC

ISSN 1082-6076 print/1520-572X online

DOI: 10.1080/10826070600574929

1143

Page 2

mixture resolution. Such a division seems most appropriate to follow also in

this review of applications of chemometrics to separation science for two

main reasons: (a) it becomes easy to compare and follow the evolution of

different studies in a specific area; and (b) for lecturing purposes it is useful

to have a review on this subject structured in well known areas.

In anotherreview paper,[2]specific for optimization methods in chromato-

graphy and capillary electrophoresis, a distinction is made in what concerns

the use of chemometrics methods where no explicit models are required,

and hard modelling when models are available, and, in such a case, optimi-

zation can be performed by regression methods. Hard modelling can not be

left completely out of this review on the application of chemometrics to

separation science, since both the large variety of chromatographic parameters

and the vast number of relationships between them leads to the conclusion that

the combination of chemometrics methods with hard modelling will certainly

bring some improvements to the prediction of retention,[2,3]one of the most

specific application of chemometrics to separation sciences. Besides, if one

considers the general definition of chemometrics as mentioned at the top of

this review, there is no practical reason for making a distinction between

the application of chemometrics, and/or hard modelling to separation

science. They both aim at extracting maximal chemical information through

the application of statistical and mathematical methods.

Various textbooks on the major areas of Chemometrics, as well as

dedicated journals are available, and as a first approach, the review paper

on the evolution of chemometrics[1]can be considered an excellent source

of references to start with, besides containing short introductions to several

relevant methods for multivariate calibration, pattern recognition, and

mixture resolution. Another recent review paper[4]on chemometrics in

Hungary, for the last 10 years, also deserves a special mention as a source

of important information on the work of chemometricians, although only

focused on the Hungarian works.

The specificities of separation science bring into play subjects such as

analysis of peak asymmetry, peak overlapping, and quantitative structure

retention relationship, which are not usually highlighted in reviews of

chemometrics. Although the general accepted classification of major areas

in chemometrics will be used in this review, one must bear in mind that the

same method (such as artificial neural networks) can be used in different

applications (such as multivariate calibration or prediction of retention).

Therefore, this review is an attempt to reach equilibrium between emphasizing

the applications of chemometrics methods, without loosing sight of the

analytical specificities in separation science. As shown in Figure 1,

principal components analysis (PCA) is the most used chemometric

technique in separation science, followed by factorial design (FD), cluster

analysis (CA), artificial neural networks (ANN), and partial least squares

(PLS). The criteria of highlighting the techniques cited more than 5 times

included a plethora of several other techniques under the heading “others”.

A. C. Duarte and S. Capelo1144

Page 3

The trend in the application of chemometric techniques in separation sciences

can be observed in Figure 2, where the number of cited papers in this review is

plotted against the year of publication. Although the number of citations per

year is not remarkable, the trend to increase is apparent, mainly due to the

application of chemometrics to distinguish or to characterize samples of

different origin or nature. The older publications use chemometrics to

solving separation related problems, such as, peak definition or peak

resolution. In Figure 3, where the number of cited papers for the most cited

chemometric technique (PCA) versus year is presented, it can be observed

that it is in 2004 that most citations appear.

Figure 1.

article.

Chemometric techniques versus number of times of appearance in the

Figure 2.

Number of appearance of all chemometric techniques versus year.

Application of Chemometrics in Separation Science1145

Page 4

Peak Asymmetry and Overlapping of Chromatographic Peaks

Peak asymmetry and overlapping of chromatographic peaks are often dealt

with in separation science, and chemometricians have been developing

methods for improving our knowledge on those two subjects for some time

now. Asymmetry of skewness of chromatographic peaks has long been

recognized as extremely important regarding signal processing, since the

deviation of the real peak shape from the symmetrical peak has a significant

impact on the estimation of many chromatographic figures of merit, such as

the retention time, the peak area, the peak width at half peak height, and

even the degree of peak overlapping.

It is fundamental to examine all possible reasons why the real chromato-

graphic peaks are not symmetrical in order to approach the problem; first

by examining deeply its physical chemical nature, then, second by fitting

mathematical functions to characterize the asymmetric factor (skew) of real

chromatographic peaks, and, finally, by assessing the usefulness of the

proposed methodology in real cases. The principal reasons for the observation

of deviation from the ideal symmetrical peak are the column overload,

non homogeneity of the stationary phase, the non homogeneity of the

column packing, the occurrence of slow mass transfer, the effects due to

kinetics of adsorption-desorption, and the dispersion phenomena on the band

profile. Besides the phenomena occurring in the column, there are other

instrumental and operational causes which can play an important role in the

occurrence of asymmetrical peaks and broadening effects observed in sepa-

ration science. Instrumental details such as connections, tubing and fittings

within the chromatographic apparatus, as well as the injection port and

detector characteristics, may increase the width and asymmetry of the peak.

Figure 3.

year.

Principal components analysis (PCA) (as the most cited technique) versus

A. C. Duarte and S. Capelo1146

Page 5

Operational details, such as when the temperature of the injection port in gas

chromatography is not high enough, may cause the occurrence of peaks with

a significant degree of tailing.

The above mentioned approach has been generally followed and conse-

quently, many methods can be found in the literature for the determination

of chromatographic peak parameters and calculation of factors related to

peak shape asymmetry. These methods have been revisited and reviewed to

the point of suggesting a new method for the determination of peak shape

asymmetry in chromatography.[5]First, the real chromatographic peak is

fitted to a mathematical function,[6]and then the peak width of the symmetri-

cal curve is fitted to the measured data with the same peak maximum position

and height of the fitted asymmetrical peak. The differences of the asymmetri-

cal and symmetrical peaks and the areas of the difference curve on the

ascending and descending half parts of the peak characterize the real

asymmetry of the measured peak.[5]The application of this concept to

separation of phenol derivatives by HPLC in the function of flow velocity,

allowed not only determination of asymmetry factors but also suggest that

the examination of asymmetry can give an insight into the physicochemical

processes occurring in the column.[5]The effect of intraparticle mass

transfer resistances on the chromatographic peak shape was studied both

theoretically and experimentally in chromatography of proteins,[7]where

very broad and highly asymmetrical peaks are usually observed.

Besides skewed peaks, it is also usual to observe the occurrence of

overlapped (and skewed) peaks, which further increases the difficulties of

assessing the chromatographic figures of merit. Cai and Wu[8]developed a

method for calculating the statistical moments of overlapped peaks by using

a parameter determined from the leading half of the peak to indicate the

peak asymmetry, and the method was successfully applied for the deconvolu-

tion of a series of overlapping peaks. Pa ´pai and Pap[9]suggested a new mathe-

matical equation, result of the product of a Gaussian function and a

polynomial, to fit symmetric, asymmetric, and strongly fronting peaks,

adequate for the determination of chromatographic peak parameters by

nonlinear curve fitting using statistical moments; furthermore, the function

could be used, with acceptable results, for resolution of overlapped peaks,

correction of baseline drift and noise filtering. Fourier transformation (FT)

has been applied to multicomponent chromatograms in order to characterize

retention patterns,[10]to improve the signal to noise ratio of chromatographic

peaks[11]and, at a further stage, to unravel superposition of more complex

retention patterns.[12]A combination of a stochastic model with mobile

phase dispersion has been proposed by Felinger et al.,[13]as the stochastic-

dispersive theory of chromatography, to include the effects of slow mass

transfer or adsorption-desorption kinetics and the band profile axial dispersion

on the full description of the efficiency of separations, as well as the chromato-

graphic peak features. Pietrogrande et al.[14]reported a general method for

evaluating the statistical degree of peak overlapping in a multicomponent

Application of Chemometrics in Separation Science1147

Page 6

chromatogram, once the retention pattern is determined in advance. The same

basic principles were also used for deconvolution of overlapping skewed

peaks,[15]and for determination of the critical degree of peak overlap[16]in

multicomponent chromatograms. In this case, as the heights of adjacent

peaks may differ widely, no single value of critical peak resolution can be

determined in multicomponent chromatograms. Therefore, only a distribution

of critical peak resolutions can be determined by using the peak height

distribution. A mathematical function has been derived by Fellinger[16]for

the critical peak resolution, from which the average or maximum probable

peak resolution can be obtained by integration.

Besides further examples of utilization of Fourier transform[17,18]for

decoding complex multicomponent chromatograms, the extended Kalman

filter in frequency domain[19]has also been a useful method to perform

parameter estimation on the basis of the FT of the signal, and used for the

evaluation of overlapping signals, mainly for noise filtering and for deconvo-

lution by peak sharpening.

Liu and Davis[20]evaluated the predictions of the statistical overlap

theory in the observed peak overlap of 38 neutral compounds by micellar elec-

trokinetic chromatography (MEKC), based on the assumption of an inhomo-

geneous Poisson distribution of migration times. Besides, the probability

distributions and statistical moments in the separation were computed by

Monte Carlo simulation.

Zhang et al.[21]reported the successful application of multilayer percep-

tion artificial neural networks (ANN), based on genetic input selection for

quantification of the unresolved peaks in micellar electrokinetic capillary

chromatography-DAD.

Gao et al.[22]modified the non-negative matrix factorization introducing

the characteristics of the chemical signals, such as smoothness of spectra,

unimodality of chromatograms, and sparseness of mass spectra for resolving

overlapping spectra and, therefore, transforming the traditional NMF

algorithm adequate as a resolution method for complex samples.

Hu et al.[23]proposed a singular value ratio (SVR) for resolving peaks in

HPLC-DAD and detecting the presence of impurities in complex mixtures.

This simple chemometric approach is based on the technique of moving

fixed size windows along the retention time axis and extracts information

about selectivity in singular value evolving profiles, in order to more effec-

tively determine peak homogeneity for HPLC-DAD.

Quantitative Structure–(Chromatographic) Retention

Relationships (QSRR)

Reubsaet and Jinno[24]on reviewing and discussing the interactions between

an analyte and the stationary phase with relation to the physicochemical

properties of both, concluded that a chemometric approach provides insight

A. C. Duarte and S. Capelo1148

Page 7

into combinations of these interactions that dominate the retention behaviour

of analytes in RP-HPLC.

He ´berger,[25]on examining QSRR of 20 hydrocarbons (mainly alkylben-

zenes) in silicone oil 550 (moderately polar phase) or bentone 34 (polar phase)

as the stationary phase, when establishing empirical correlations between gas

chromatography retention data and physical or topological properties of solute

molecules, has shown the superiority of a nonlinear fitting equation in describ-

ing retention time using a boiling point (BP) and another topological descrip-

tor. Also, He ´berger[26]later proved the excellence of the predictive ability of

this nonlinear model for apolar and slightly polar stationary phases using alkyl

benzenes as model compounds. Finally, He ´berger and Kowalska[27]developed

studies on the thermodynamics of Kova ´ts retention index-boiling point corre-

lations for alkylbenzenes in gas chromatography, which allowed prediction of

retention indices, standard chemical potential of partitioning of one methylene

group of n-alkane, and the average heat of vaporization/solution values for

low and moderately polar stationary phases.

Kaliszan[28]reviewed the combination of available data regarding the

application of affinity CD and HPLC and chemometrics (mainly MLR,

PCA, and Cluster Analysis (CA)) to evaluate the strength of drug-protein

interactions, and to provide information of relevance to molecular pharma-

cology and for drug design. QSRR equations derived for test series of drug

analytes are interpreted in terms of structural specificities of binding sites in

biomacromolecules. In another review on the role of chromatography and

capillary electrophoresis in modelling the basic processes of drug action,

Kaliszan[29]emphasized the combination of separation sciences and chemo-

metrics (such as PCA) for increasing speed and efficiency in establishing

quantitative relationships between the chemical structure of drugs and their

ability to participate in intermolecular interactions with the components of

living systems, besides reducing the research costs with new drugs and the

use of laboratory animals. The principles and methodology of QSRR using

biologically active chemicals are discussed by Kaliszan[29]in the context of

using the chromatographically measured parameters for describing the inter-

actions of drugs with biomolecules, in order to predict quantitative differences

in specific bioactivity parameters of a given pharmacological family, i.e.,

establishing quantitative structure activity relationships (QSAR) with

retention parameters serving as structure parameters.

The retention characteristics of a newly designed cholesterol bound silica

gel stationary phase for RP-HPLC were revealed by analysis of QSRR, which

was derived by multiple regression analysis (MRA), using three groups of

structural descriptors of analytes and the log kwdata determined on the new

stationary phase.[30]

Baggiani et al.[31]used PCA combined with PCR to search for corre-

lations between the selectivity of a pentachlorophenol (PCP) imprinted

polymer packed in a HPLC column and the molecular descriptors of 52

PCP-related phenols. The QSRR procedures were applied to correlate the

Application of Chemometrics in Separation Science 1149

Page 8

chromatographic behaviour of several phenols, mainly characterized by their

retention time, with structural parameters determined by molecular mecha-

nisms or semi empirical quantum chemical techniques. The magnitude of

the multivariate model’s parameters shows that selectivity is strongly influ-

enced by molecular descriptors having a structural character, while the

effect of molecular descriptors having an electronic character is much less

marked.

Bolliet and Poole[3]used statistical mixture design techniques associated

to the solvation parameter model, in order to predict retention time in reversed

phase liquid chromatography using ternary mobile phase compositions. The

solvation model is used to obtain system constants characteristic of the

chromatographic capability for defined solvent-solvent and solute-solvent

interactions. The mixture design was used to build system surfaces, for each

system constant for all mobile phase concentrations, which could then be

used in models to predict the retention of a wide group of 36 solutes on a

porous polymer sorbent at a methanol-acetonitrile-water composition not

included in the data used to construct the system surfaces.

Quantum chemical calculations provide acceptable descriptors for

characterization of molecular properties in QSRR. Ko ¨rtve ´lyesi et al.[32,33]

established correlations between the Kova ´ts retention index and quantum

chemical descriptors for aliphatic ketones and aldehydes on stationary phases

of different polarity, besides developing calculations of quantum chemical

descriptors (i.e., van der Waals’ surface areas and volumes), and highlighting

their good correlation with the Kova ´ts retention index for alkanes, alkenes,

and azo compounds. Ko ¨rtve ´lyesi et al.[32]used multiple linear regression

(MLR), cluster analysis (CA), and principal component analysis (PCA) for

the statistical evaluation of the relationships between Kova ´ts retention

indices and the calculated molecular descriptors. Unusual retention relations

have been observed by Kiva ´ly et al.[34]in the gas chromatography of

N,N0-dialkylhydrazones (DAHs), where different correlations are necessary

for calculation of retention indices for aldehyde and ketone DAHs, due to the

different resonance structures in the aldehyde and ketone derivatives.

A PLS model has been built by He ´berger et al.[35]to estimate retention

data of 35 aliphatic ketones and aldehydes at different temperatures (50, 70,

90, and 1108C) and various stationary phases of different polarity, and also

to discriminate between ketones and aldehydes on the basis of their

retention date and physical properties (structural descriptors). Multiple

linear regression (MLR) was also used by Markuszewski et al.[36]to derive

QSRR(relatingretentionparameters

hydrogen bond descriptors and structural parameters from molecular

modelling) to evaluate the partition mechanism of environmental significant

compounds (mainly nitroaromatics and their transformation products around

former ammunition plants) on new stationary phase materials. Furthermore,

the chromatographic retention data, once subjected to PCA, allowed the

assignment of individual pollutants to defined metabolic routes.

tohydrophobicityparameters,

A. C. Duarte and S. Capelo1150

Page 9

Canonical correlation analysis (CCA) has been used in QSRR by Forga ´cs

and Cserha ´ti[37]on their work to relate retention characteristics with physico-

chemical parameters of barbituric acid derivates on a porous graphitized

carbon column using water-tertrahydrofuran mixtures as solvents, and

concluded that the electronic parameters exerted the highest impact on the

retention, while the hydrophobicity was negligible.

Valko ´ et al.[38]established relationships between the chromatographic

hydrophobicity indices (CHI) and solute descriptors obtained by using

several reversed-phase, diol, nitrile, cyclodextrin, and immobilised artificial

membrane bonded HPLC columns. The CHI data obtained on various

HPLC stationary phases were subjected to multiple regression analysis

(MRA) and PCA.

Since the partitioning of drugs is often mainly ruled by their hydrophobi-

city character, a fast screening of new drug candidates is performed by corre-

lating the logarithm of the retention factors (log k) from classical RP-HPLC

with hydrophobicity expressed by the partition coefficient, log P. The use of

new methods, which include stationary phase models of biological

membranes and micelles in their mobile phase (i.e., micellar liquid chromato-

graphy (MLC), micellar electrokinetic capillary chromatography (MECC)),

are alternatives for providing QSRR (as conventional RP-HPLC does), as

well as quantitative retention activity relationships (QRAR) using biological

parameters and wishfully mimicking biological partitioning.[39]Detroyer

et al.[39]used PCA score plots to differentiate between pharmacological

classes based on their retention characteristics on MLC, MECC, and IAM,

as well as on conventional methods, and concluded that the insertion of

these new methods would allow covering a whole range of retention mecha-

nisms, besides hydrophobicity, very useful to model QRAR. In another work,

Detroyer et al.[40]also explored the potential of PCA, CA, weighted holistic

invariant molecular (WHIM) descriptors (in a variant of PCA), sequential

projection pursuit (SPP) (another variant of PCA) for the classification of

83 substances, from 5 pharmacological families, based on chromatographic

data on 8 HPLC systems.

Artificial neural networks (ANN) were successfully developed by

Jalali-Heravi and Garkani-Nejad[41]for the modelling and prediction of

electrophoretic mobility of a series of sulfonamides, positively charged at

low pH and negatively charged at high pH, in capillary zone electrophoresis

(CZE) using, for modelling purposes, part of a set of 47 descriptors divided

into four groups of topological, geometric, and physicochemical parameters.

The results obtained using ANN compared very well with the experimental

values and showed some superiority to those obtained by MLR techniques.

Cserha ´ti and Forga ´cs[42]applied PCA to prove that the hydrophobicity

parameters determined by RP-HPLC and RP thin-layer chromatography

(TLC) are slightly different, and herbicides and fungicides could not be

discriminated according to their hydrophobicity parameters. The significance

of this finding in terms of application of QSAR in agrochemical research is

Application of Chemometrics in Separation Science1151

Page 10

thatdifferencesinthe biologicalactivityofthesepesticidescannot beattributed

to either their hydrophobicity or specific hydrophobic surface area alone.[42]

Farkas et al.[43]compared RI prediction of three groups of saturated O-,

N-, and S-containing heterocycles using variable subset selection (VSS) and

PLS methods to select, from a large data set, the most appropriate independent

descriptors. After choosing the appropriate independent variables, predictive

models were then built, and compared, using MLR and PLS methods. PLS

failed to select boiling point (BP) or molecular mass as descriptors of the

RI, but after including the selected descriptors with BPs, PLS provided

proper models for RI prediction. The main conclusion was that combination

of the MLR for variable selection and PLS for model building should be a

useful methodology for RI prediction.

The support vector machine (SVM) and the heuristic method (HM) were

used by Luan et al.[44]to develop the non-linear and linear models between the

retention time (RT) and five molecular descriptors of 149 volatile organic

compounds (VOCs). The HM was used for the prediction of RT using calcu-

lated structural descriptors and, even after the heuristic reduction, the pool of

descriptors was reduced to 150. In order to avoid over parameterization, a

number of 5 descriptors seemed to be enough for a successful linear model.

From the physical meaning of the descriptors in the linear model, it can be

suggested that the polar interactions and relative reactivity are likely to be

the two major factors controlling the retention behaviour of VOCs on

non-polar stationary phases of DB-1. The SVM was then used to built a

non-linear model based on the same set of descriptors obtained by the HM.

The performance of the SVM proved to have better predictive ability then

that provided by the HM.

SMV was also used, as a nonlinear regression method, by Liu et al.[45]to

develop a nonlinear quantitative structure mobility relationship model of

peptides based on the structural descriptors. MLR and SVM were used

to select the descriptors responsible for the electrophoretic mobility of

peptides and develop linear and non-linear models, respectively, for the

prediction of the electrophoretic mobilities of peptides. The performance of

the SVM model showed a better performance than the MLR, which highlights

that the non-linear model can describe better the relationship between the

structural descriptors and the electrophoretic mobilities of peptides.

Classification of Stationary Phases and Polarity Indicators

The column selection is not a straightforward process and stationary phases

can only be appropriately selected for a given separation when there is a

sound knowledge about the physical meaning and the classification of the

phases. Valko ´ et al.[38]suggested the use of a high throughput chromato-

graphic hidrophicity index (CHI) for characterizing the different selectivities

of stationary phases according to the solute-stationary phase interaction

A. C. Duarte and S. Capelo1152

Page 11

properties for each specific phase. He ´berger[46]evaluated polarity indicators

and stationary phases in gas-liquid chromatography using PCA, and

suggested that no single polarity variable can be used on its own. Furthermore,

physical meaning could be associated to the most influential principal com-

ponents: PC1, attributed to polarity defined as usual; PC2, attributed to

hydrogen donating and hydrogen accepting ability with opposite signs; and

PC3, attributed to dipole interactions. He ´berger et al.[47]also used PCA for

assessing polarity indicators and solute-solvent interaction parameters in

order to classify ketones, their oximes, and mixtures as stationary phases in

inverse gas chromatography. Forlay-Frick et al.[48]used PCA, as an unsuper-

vised pattern recognition technique, to unravel patterns, and test the possible

replacement of HPLC systems from data on plate numbers and symmetry

factors measured for three solutes (benzoic acid, N,N0-dimethyl-aniline and

Vancomycin) in various chromatographic systems (stationary phases and

different mobile compositions). Although, the theoretical plate number

(column efficiency) is negatively correlated with the symmetry factor,

Forlay-Frick et al.[48]concluded that both are necessary for proper classifi-

cation and characterization of stationary phases. Iva ´nyi et al.[49]evaluated if

PCA could reduce the number of chromatographic test parameters, while

maintaining the classification of RP-HPLC stationary phases. Fewer para-

meters, only three or four, could be enough for keeping column clustering

and column differentiation without much loss of information and, in some

cases, provided even more detailed results. Poole and Poole[50]also

recognized the role of PCA and CA to classify the stationary phases, by

their similarity for specific intermolecular interactions, on their study about

the chemometric classification of the solvent properties (selectivity) of

commonly used GC stationary phases.

Gyseghem et al.[51]measured chromatographic parameters representing

hydrophobicity, steric selectivity, efficiency, silanol activity, H-bonding

capacity, and ion exchange capacity, and compared the selection of

RP-HPLC columns with diverse selectivity towards the potential separation

of impurities in drugs. The initial selection was based on the visual inspection

and personal experience and a reevaluation was then performed using the

Pareto-optimality method, PCA, and Derringer’s desirability functions

approach. The selection by the chemometric approaches was found to be

fairly comparable with the initial selection.

Forlay-Frick et al.,[52]for the selection of orthogonal/similar chromato-

graphic systems using retention factor data of drugs, applied the generalized

pairwise correlation method (GPCM) using different statistical tests

(Williams’ t, Conditional Fisher’s, McNemar’s and Chi-square tests), and

the results were compared and validated with those obtained correlation coef-

ficients (Pearson’s product moment correlation coefficient, Spearman’s rho

and Kendall’s tau). The vast majority of retention data for chromatography

is not normally distributed, the classification of the systems was strongly

dependent on the applied method, and a comparison on the basis of the

Application of Chemometrics in Separation Science1153

Page 12

orthogonality ratios calculated using the different techniques to obtain infor-

mation about the discrepancies existing in the results allowed Forlay-Frick

et al.[52]to use it, even when the distribution of the data is not normal.

Retention data of newly synthesized stationary phases and structural

descriptors derived by molecular modelling of nucleosides and cyclic nucleo-

tides were subjected to PCA, allowing Turowski et al.[53]to conclude that

from the 11 new materials, 8 hydrocarbon bonded silica phases had separation

properties regarding nucleosides and nucleotides similar to a standard

octylsilica phase. Furthermore, PCA of structural descriptors of solutes

facilitate the identification of structure features that affect retention, while

PCA of retention data provides a method for comparison characteristics and

classification of the HPLC systems.

Optimization of Operational Conditions Associated with

Chromatographic Processes

The optimization methods applied to parameters of interest in either

chromatography or capillary electrophoresis have been reviewed by Siouffi

and Phan-Tan-Luu.[2]The review[2]establishes a separation between

optimization methods using a chemometric approach and methods based on

models. The conclusion is that chemometrics tools such as the simplex

algorithm, the overlapping resolution maps, the factorial designs, the

response surface methodology, and the neural networks have the great

advantage of not needing an explicit equation for the objective function

with the desired criteria. However, chemometrics methods may require a

large number of experiments, and boundaries of the domain are not easy to

draw. Therefore, when models are available, optimization becomes straight-

forward by regression methods, and the review[2]also concludes, that

among the chemometric methods, the simplex and the overlapping resolution

maps are declining, while the use of artificial neural networks is increasing.

Besides, the factorial designs and central composite designs are becoming

more popular in capillary electrophoresis since the number of parameters to

optimize is much larger than in CG or LC.

The development and validation of a capillary electrophoresis (CE)

method by Wynia et al.[54]involved the use of several chemometric

methods, such as Box-Behnken design, PLS, and MLR, for assessing the suit-

ability of CE in the determination of drug substance, mirtazapine, and five

structurally related substances. The capillary efficiency and the overall

quality of separation of benzodiazepines in CE was assessed by Peyrin and

Guillaume[55]by means of a new response function, which was maximal

when both efficient separation conditions and a minimum analysis time

were met, and using Box-Benhken design followed by simplex optimization.

Guillaume and Peyrin[56]performed the optimization of the migration time,

height equivalent to a theoretical plate and separation of a mixture of

A. C. Duarte and S. Capelo1154

Page 13

imidazole compounds by CE, with a method based on the application of

a simplex to a polynomial derived from preliminary experiments designed

by factorial designs.

Corradini et al.[57]applied a full factorial design to optimize the CZE con-

ditions (temperature, voltage, and percentage of methanol added to the back-

ground electrolyte) for characterization of fructooligosaccharides and inulin at

different degree of polymerization, in order to evaluate their prebiotic

properties.

The successful application of fractional factorial design and Plackett-

Burman designs have been reported[58]when addressing the robustness of

an HPLC assay method issued by the United States Pharmacopoeia for ginse-

nosides, in Asian and American ginseng. The interpretation of the effects on

the responses were evaluated both graphically (with a half normal probability

plot) and numerically by a t-test. Perrin et al.[59]also used a Plackett-Burman

design on testing the robustness of enanteomeric separation of a basic

(propanolol), a neutral (praziquantel), and an acidic (warfarin) compound

by CE using highly sulfated cyclodextrins.

A fractional factorial design, together with a star design, were used for

studying the effects of 3 factors (the alkyl chain length, the concentration of

the ion interaction reagent, the concentration of the acetonitrile, and the pH

of mobile phase) and their interactions, in order to develop and optimize a

method of ion-interaction chromatography (IIR-RP-HPLC) for the simul-

taneous separation of 21 polar aromatic sulfonates.[60]The failure to obtain

good predictions with a linear regression algorithm was overcome by using

the Box-Cox transformation after which Marengo et al.[60]located the

optimal experimental conditions for separations.

A fractional factorial design was used by Persson-Stubberud and

A˚stro ¨m[61]for development and optimization of a capillary electrophoresis

separation of ibuprofen, codeine phosphate, their degradation products and

impurities. In the analysis of ranitidine and related compounds by capillary

electrophoresis. Morris et al.[62]used a strategy involving fractional

factorial designs to screen and determine the significant factors controlling

separation and central composite designs to determine the optimal conditions

for the separations. Models were generated by multilinear regression (MLR)

and canonical analyses were used to calculate the optimum conditions.

Wynia et al.[54]developed and validated a capillary electrophoresis separation

for the assay of antidepressant mirtazapine and five structurally related sub-

stances in a tablet formulation, following a Box-Behnken design and PLS

as multivariate modeling technique for relating the current profiles and both

migration times and peak areas.

The selection of optimum variable separation conditions (pH and concen-

tration of a complexing agent in the buffer electrolyte) of rare earth metal ions

in capillary zone electrophoresis (CZE) was accomplished by Jimidar et al.[63]

by application of a multicriteria approach: the separation method was

optimized using a central composite design for the two variables and the

Application of Chemometrics in Separation Science1155

Page 14

Derringer’s desirability function was applied to determine the most desirable

combination of separation, sensitivity, and analysis time.

The optimization and comparison of neurotransmitter amino acid separ-

ation in normal micellar electrokinetic chromatography (N-MEKC) and

reversed migration electrokinetic chromatography (RM-MEKC) were

carried by Wan et al.[64]using a central composite design with multivariate

linear regression. A face centred cube central composite design was also

used by Safa and Hadjmohammadi[65]to evaluate the effect of the concen-

tration of sodium dodecyl sulfate, propanol content, and pH of the mobile

phase in micellar HPLC separation of chlorophenol isomers. Besides, the

use of the Pareto-optimality method, an approach from the multicriteria

decision making, allowed Safa and Hadjmohammadi[65]the selection of the

best possible combination of separation quality and analysis time, with the

consequent production of chromatograms of superior quality.

A three-step procedure was followed by Brunnkvist et al.[66]in the search

of optimal chromatographic separation conditions of two different capillary

electrophoresis (CE) stationary phases (Hypersil phenyl and Hypersil C18)

for four basic peptides, using peak resolution and peak efficiency as

response function: a) initial studies for determination of the experimental

variables and their respective domains for the two stationary phases; b) appli-

cation of a Plackett-Burman design and PLS models were preferred to MLR’s,

due to their higher predictive ability; and c) optimization according to a

central composite design combined with both MLR and PLS models for

each stationary phase.

Chiralcel OD-R, as a chiral stationary phase with mobile phase containing

acetonitrile (modifier) and sodium perchlorate (buffering component), was

found by Wso ´l and Fell[67]to be the most suitable system for chromato-

graphic enantioresolution of rac-11-dihydrooracin. The Box-Wilson central

composite design was employed to find the optimal conditions of temperature,

modifier concentration, and buffer concentration for new proposals of chroma-

tographic response functions, based on the resolution and the retention time

of the last component eluted.

In terms of sample preparation, extraction, and other conditions prior to

separation, Romva ´ri et al.[68,69]developed a rugged sample preparation

approach, based on Taguchi’s method, which allowed obtaining high

accuracy and reliability in the use of HPLC for the determination of two

main metabolites of albenzadole (albenzadole-sulfoxide and albenzadole-

sulfone) in cow’s milk. Designed experiments according to Taguchi’s

method were also used by Ehmann et al.[70]for investigation of alkaline and

acid prerinsing techniques in capillary preconditioning for analysis of

anions using indirect UV detection in capillary zone electrophoresis.

Keszler and He ´berger[71]used PCA for studying the influence of extraction

parameters and medium on the efficiency of solid phase microextraction

(SPME) sampling, in the analysis of aliphatic aldehydes by GC-MS. Tukai

et al.[72]used factorial design (for selecting the most significant factors) and

A. C. Duarte and S. Capelo1156

Page 15

central composite design for optimization of the conditions of microwave

assisted extraction, prior to HPLC-IPMS analysis of arsenic species in

marine macroalgae. Johansen and Rasmussen[73]used a factorial design and

response surface modelling (RSM) for screening and optimization of

dialysis recoveries of antidepressant drugs in human plasma prior to HPLC.

Beijersten and Westerlund[74]optimized the conditions of derivatization

of dipeptides for separation by micellar electrokinetic chromatography,

using a fractional factorial design for screening experiments and a central

composite design for response surface modeling. A central composite

design was also followed by Daali et al.[75]to optimize the conditions of

chemical hydrolysis (acid concentration, hydrolysis temperature, and hydroly-

sis time) of sialic acid in a soluble caseinoglycomacropeptide, and the sialic

acid release was monitored by high performance anion exchange chromato-

graphy with pulsed amperometric detection (HPAEC-PAD).

The optimization of RP-HPLC separation of p-hydroxybenzoic acid and

its esters, was performed by Guillaume and Peyrin[76]with a new algorithm

based on Glover’s taboo search (TS), with better results than those obtained

with pure random search and simplex search. A novel chromatographic

response function (CRF) already used by Peyrin and Guillaume,[55]which

was maximal when both efficient separation conditions and a minimum

analysis time were met, allowed obtaining the optimal conditions for

column temperature, the water fraction in the mobile phase, and its flow rate.

In order to avoid a blank chromatographic run, Boelens et al.[77]

developed a two step process, which takes into account the shape and

also the intensity differences of the background eluent spectrum during the

HPLC separation process coupled to spectroscopic detection: first, the

baseline spectra are modelled using a limited number of PCs; subsequently,

an asymmetric least squares regression method allows for correction of the

spectra during elution for the background correction.

Doehlertmatrixdesign, as a chemometric tool, deserves a special mention

since its application to analytical chemistry has been recently reviewed by

Ferreira et al.[78]The review paper[78]discusses the advantage of Doehlert

design with other response surface designs, such as central composite and

Box-Behnken designs, and discusses the application of Doehlert matrices in

chromatography, besides a reference to the first demonstration that application

of this design in analytical chemistry is more appropriate and more economi-

cal than the central composite design for optimization of a separation process

using HPLC.[79]Considerable reduction of analysis time in lengthy HPLC was

obtained by Araujo[80]when the Doehlert design was selected for systematic

and simultaneous optimization of the gradient solvent system and the instru-

mental/experimental variables associated to the HPLC-DAD chromatograms

of chloropigments, chlorophyll a, hydroxyl chlorophyll a, and methoxylac-

tone chlorophyll a. Further applications to the HPLC separation of orto,

meta, and para cresol in antiseptic products, and polyaromatic hydrocarbons

in synthetic mixtures, allowed also a significant reduction in analysis time.

Application of Chemometrics in Separation Science 1157