ADHD identiﬁcation based on a linear projection and

clustering

P.A. Castro-Cabreraa,∗

, D.H. Peluﬀo-Ord´o˜neza, F. Restrepo de Mej´ıab,

C.G. Castellanos-Dom´ıngueza

aUniversidad Nacional de Colombia

Signal Processing and Recognition Group

Manizales, Colombia

bUniversidad Aut´onoma de Manizales

Grupo de Neuro-aprendizaje

Manizales, Colombia

Abstract

Event-related potentials (ERPs) are electrical signals from brain genera-

ted as a response to an external sensorial stimulus. This kind of signals

are widely used to diagnose neurological disorders, such as Attention-deﬁcit

hyperactivity disorder (ADHD).

In this paper, a novel methodology for ADHD discrimination is propo-

sed, which consist of obtaining a new data representation by means of a

re-characterization of initial feature space. Such re-characterization is done

through the distances between data and centroids obtained from k-means

algorithm. This methodology also includes pre-clustering and liner projec-

tion stages. In addition, this paper explores the use of morphological and

spectral features as descriptive patterns of ERP signal in order to discri-

minate between normal subjects and ADHD patients. Experimental results

show that the morphological features, in contrast with the remaining featu-

res considered in this study, are those that more contribute to classiﬁcation

performance, reaching 86 % for the original feature set.

Keywords: ADHD, ERP, clustering, linear projection

1. Introduction

Attention-deﬁcit hyperactivity disorder (ADHD) is a prevalent disorder diag-

nosed on the basis of persistent and developmentally-inappropriate levels of

∗pacastroc@unal.edu.co

VII Seminario Internacional de Procesamiento y An´alisis de Im´agenes M´edicas SIPAIM 2011

overactivity, inattention and impulsivity. It is one of the most common psy-

chiatry disorders in childhood [1]. Currently its diagnosis is based on the

clinical criteria of DSM-IV or ICD-10, helped by the conduct outlined in

questionnaires applied to parents and teachers; however, there are not bio-

logical markers or conclusive tests that diagnose this behavioral disorder

with a high reliability [2].

Event-related potentials (ERPs) are brain electrical signals generated as

a response to an external sensorial stimulus. They have been useful in inves-

tigations of perceptual and cognitive-processing deﬁcits, specially in children

with ADHD, given that these potentials are physiologically correlated with

neurocognitive functions. The most popular assessed features on ERPs for

interpretation of cognitive processes are the areas and the peaks of the ERP

components, deﬁned by the mean and peak to peak voltages, respectively,

which are computed in certain windows in the time domain. This parameters

are determined by visual inspection of the averaged ERP waveforms [3].

The ERPs comprise of a number of characteristic peaks and trough which

basic research has shown to correspond to certain underlying processes. P300

component is perhaps the most studied ERP component in investigations of

selective attention and information processing, due partly to its relatively

large amplitude and facile elicitation in experimental contexts [4]. Although

the quantiﬁcation of ERP components by areas and peaks is the standard

procedure in fundamental ERP research, the conventional approach has two

drawbacks:

Firstly, ERPs are time-varying signals reﬂecting the sum of underlying

neural events during stimulus processing, operating on diﬀerent time sca-

les ranging from milliseconds to seconds. Various procedures such as ERP

subtraction or statistical methods have been employed to separate functio-

nally meaningful events that partly or completely overlap in time. However,

the reliable identiﬁcation of these components in the ERP waveforms still

remains as a problem.

Secondly, analysis in the frequency domain has revealed that EEG/ERP

components in diﬀerent bands (delta, theta, alpha, beta, gamma) are fun-

ctionally related to information processing and behavior. However, the Fou-

rier transform (FT) of ERP lacks the information about the time localization

of transient neural events. Therefore, eﬃcient algorithms for analyzing a sig-

nal in time-frequency plane are very important in extracting and relating

distinct functional components.

These limitations, as well as the ones related with time–invariant met-

hods, can be solved by using the wavelet formalism. The wavelet transform

2

(WT) is a time-frequency representation, that has an optimal resolution

both in the time and frequency domains and has been successfully applied

to the study of EEG–ERP signals [?][5]. Although ERP feature extraction

from the time–frequency domain based on the discrete WT (DWT) has been

growing increasingly popular, this approach can result unhelpful to patho-

logy detection purposes, particularly, to ADHD identiﬁcation.

In this paper a novel methodology is proposed that consist of a re-

characterization of initial feature space through the distances between data

and centroids corresponding to clusters obtained with k-means algorithm.

To this end, original data are ﬁrst selected by means of a pre-clustering and

linearly projected. In addition, this paper explores the use of morphological

and spectral features as descriptive patterns of ERP signal in order to discri-

minate between normal subjects and ADHD patients. Experimental results

show that the morphological features, in contrast with the remaining featu-

res considered in this study, are those that more contribute to classiﬁcation

performance, reaching 86 % for the original feature set.

2. Theoretical framework

In terms of spectral clustering, in particular, graph-partitioning cluste-

ring, aﬃnity matrix represents the relation degree between observations or

nodes. In other words, aﬃnity measure denotes the association or similitude

degree between two nodes and then it is a non-negative value.

Let A={aij }be the aﬃnity matrix that is composed by all the relations

among nodes, that satisﬁes the following conditions aij ≥0 and aij =aji .

Then, matrix Ais symmetric and positive semi-deﬁnite.

Let X= [x1,...,xn]∈Rn×pbe the data matrix where xiis a p-

dimensional vector that corresponds to considered features for i-th subject.

To guarantee the scale coherence in representation of data, matrix Xis nor-

malized , using: xi←(xi−µ(xi))/σ(xi), where µ(·) and σ(·) are a mean

and a standard deviation operator, respectively.

A trivial form to establish the aﬃnity measures corresponds to A=

XXT. For some clustering methods, this kind of aﬃnity results to be useful

because it contains the inner products between all data points or observa-

tions.

3

2.1. Data truncated-projection

In general, when using spectral techniques the clustering procedure is

carried out in a low dimensional space, named eigen-space [6]. Denoting the

eigen-space as Uthat corresponds to the eigenvectors of Aand its corres-

ponding subspace as

e

U∈Rn×m, where m < n, that corresponds to the ﬁrst

mcolumns of U.

In this work, it is proposed to employ

e

Uas a rotation matrix, but gi-

ven that nis signiﬁcantly bigger than pthe use of matrix Vthat repre-

sents the eigenvectors of XTXis preferred. This can be done because the

ﬁrst peigenvalues of XXTcorrespond to the eigenvalues of XTXwhen

||ui|| =||vi|| = 1, i = 1,...,p.In addition, it is easy to prove that there

exists a linear relation between uiand vi, which is: ui=Xvi.

Given this, data linear projection is:

Y=XV (1)

Then, truncated linear projection is obtained as follows:

e

Y=X

e

V(2)

where matrix

e

V∈Rp×q(q < p) is composed by the ﬁrst qcolumns of V.

2.2. Clustering-based representation

For most spectral clustering approaches, once the new representation

space is obtained, a conventional clustering algorithm is applied to group

homogeneous observations [7]. In this work, a clustering-based representa-

tion is proposed. To this end, a centroid-based clustering is used to obtain

a new data representation Z={zij }, where each observation is represented

by means of its distance with the centroids corresponding to each group,

i.e.,

zij = d(ˆ

yi,qj), i = 1,...,n;j= 1,...,k (3)

where kis the number of groups, qjdenotes the j-th centroid and d(·,·) is

a distance operator.

Centroids are obtained with k-means algorithm [8] and as a distance

measure the euclidian norm is used.

4

2.3. Heuristic search

To improve the performance classiﬁcation and determine the relevance

features, a feature selection stage is done. In this particular case, a heuristic

search type wrapper, called sequential forward ﬂoating selection (sequential

forward ﬂoating selection - SFFS) [9]. In this technique, for each stage a new

variable is included using a forward sequential procedure, then less signiﬁ-

cant variables are excluded one at a time until the percentage of accurate

classiﬁcations increases. Once this search can no longer continue excluding

variables, another step forward is done to include another variable and if

possible the exclusion of variables procedure is applied again. The process is

iterated until no more steps forward can be done because a high percentage

of classiﬁcation accuracy has been achieved.

3. Materials and methods

3.1. Data Base

The experiments were carried out with 120 children belonging to edu-

cational institutions of the metropolitan area of the Manizales (60 of the

healthy control group and 60 of the ADHD group). The subjects, with ages

between 4 and 15 years old, were medically diagnosed based on clinical cri-

teria of DSM-IV and minikid criteria by a multidisciplinary specialist team

consisting of a general physician, psychologist, neuropsychologist and ex-

perts in children psychiatric disorders. Both groups were tested under the

same lighting and noise conditions, and were deﬁned by the following in-

clusion criteria: non abnormality physical examination, normal visual and

hearing ability, intellectual coeﬃcient greater than 80 and, if necessary, phar-

macologic management previously suspended. Subjects were veriﬁed to be

free of some evidence of other neurological disorder.

Recordings were acquired by means of electrodes located in the head mid-

line (Fz, Cz, Pz) according to 10 −20 international system, with a sampling

frequency of 640 samples per second. Signals acquisition took 1 s before and

after stimulus presentation. The evaluation protocol applied was the oddball

paradigm in auditory and visual modalities. The ﬁrst procedure involves the

emission of 80 dB tone lasting 50 ms, with a frequency of 1,000 Hz for fre-

quent stimulus and 3,000 Hz for target stimulus, presented randomly every

1,5 s. In the visual modality of the test, the sub ject is asked to watch a

monitor placed 1 m away that shows an image with a consistent pattern (a

checkerboard of 16 squares), which is the frequent stimulus. The rare stimu-

lus is the presentation of a target in the center of the screen with the same

5

common pattern in the background; the subject must press a button each

time the unusual stimulus appears. The experiment consists of 200 stimuli,

of which 80% are non-target and 20 % remaining are target stimuli.

3.2. Experimental setup

Methodology applied in the experiments can be seen graphically in the

block diagram shown in ﬁgure 3.2.

ERP signals

Visualmodality

Auditorymodality

Preprocessing Characterization

morphological,spectral

andwaveletfeatures

Dataprojection

PCA

Characterization

bydistances

Clustering

SFFS

Heuristicsearch

Classification

Figura 1: Proposed methodology for ERP signal analysis

Proposed methodology consist of some procedures described below. Da-

tabase counts with 6 recordings per patient, corresponding to acquisitions

of Fz, Cz and Pz electrodes in the auditory and visual modalities. In this

work, it is reported only the results of Pz auditory record, since this is the

location of the scalp where the generators of the ERP components act more

clearly.

Data matrix Xis deﬁned as suggested in [10], which consists of three

groups of features of diﬀerent nature: The ﬁrst group comprises 17 morp-

hological features, which consist of parameters measured over the whole

signal and are related to its shape. This set is formed by the following

characteristics: latency, amplitude, latency/amplitude ratio, absolute am-

plitude, absolute latency/amplitude ratio, positive area, negative area, total

area, absolute total area, total absolute area, average absolute signal slope,

peak-peak value, peak-peak value in a time window, peak-peak slope, zero

crossings, zero crossings density and slope sign alterations.

The second set of features is deﬁned by three frequency characteristics:

mode frequency, median frequency and mean frequency, which are calculated

as described in [10]. Using the discrete wavelet transform, we obtain the third

set of characteristics, which corresponds to the wavelet coeﬃcients from the

previous levels of decomposition.

After characterizing, the corresponding processing is performed on the

matrix Xusing the following procedure: centralization and standardization

of data, outlier detection and veriﬁcation of univariate gaussivity. In addi-

tion to the above procedure, it is performed a data pre-clustering with the

6

methodology used in [11], thus ensuring consistency in data and facilitate

the analysis.

Now, the projection of the data is done by using the technique explained

in Section 2.1. The criterion used to determine qis an accumulated variance

value greater than 90 %

Subsequently, data representation is redeﬁned through the distance bet-

ween the data and the centroids of the formed groups by applying the clus-

tering technique. To this end, it has been implemented the traditional algo-

rithm of k-means and used Euclidean distance as a dissimilarity measure.

Once projected data are re-characterized calculating the distances bet-

ween the data and the centroids, a heuristic search algorithm is applied. In

this case, a sequential ﬂoating forward selection (SFFS) is considered. This

is done in order to perform a supervised reduction that may lead to ﬁnd the

smallest number of features that allow suﬃciently data classiﬁcation. The

implemented SFFS algorithm utilizes as a classiﬁcation assessment function

a Bayesian classiﬁer, since each probability density function is modeled as

Gaussian. In addition, the method was improved by a hypothesis test (t-test)

and an evaluation of information loss stage [12].

The following algorithm describes mathematically and sequentially the

stages of the proposed methodology previously explained.

Algorithm 1 Re-characterization of ERP signals through dissimilarity mea-

sures

Input: Xn×p.

1. A pre-clustering stage is applied over data matrix: ˆ

X= preclustering{X}, where

matrix ˆ

Xis h×pdimensional and h < n.

2. Estimate the covariance matrix ΣX.

3. Compute the eigenvalues Λ= diag(λ1, λ2,...,λp)and eigenvectors V=

[v1|...|vp]of ΣXdecreasingly organized, λ1≥λ2≥...≥λp.

4. Determine the value of q(q < p) through an accumulated variance, greater than

90 %

5. Obtain the truncated linear projection: ˆ

Y∈Rh×q=ˆ

Xˆ

V.

6. Cluster data and obtain ﬁnal centroids: Q= [qT

1|···|qT

p] = kmedias( ˆ

Y)

7. Re-characterize projected data: B∈Rh×k={bij}={d( ˆ

yi,qj)}, i =

1,...,n;j= 1,...,k,d(·,·)is the euclidian distance,

8. ˆ

B∈Rh×m= SFFS{B},mis the number of relevant variables, m < k

9. Test: Classiﬁers k-nn, LDC and SVM (70 % for training and 30 % for test).

ˆ

B={eﬀective feature set}

7

4. Results and discussions

The acquisition of recordings was carried out in the auditory and visual

modalities through electrodes placed at positions Fz, Cz and Pz, as explai-

ned in Section 3.1. In this work, it is reported only the results of the Pz

position in the auditory modality, since it is the region where the ERP sig-

nal generators yield event-related potentials with component more deﬁned

and greater amplitude.

Data matrix Xhas been made up for 16 morphological and 3 spectral

features, and 32 wavelet coeﬃcients. To calculate the wavelet features, the

records must be resampled to 1024 Hz and discrete wavelet transform was

used with a biorthogonal spline as a wavelet function, and 3 vanishing mo-

ments. For this work, a decomposition of 7 levels was applies, in order to

approximately adjust the frequency band levels into the brain rhythms such

as delta (0,2 to 3,5 Hz), theta (3,5 to 7,5 Hz) , alpha (7,5 to 13 Hz) and beta

(13 to 28 Hz). From 7 obtained decomposition levels, approximation coeﬃ-

cients of level 7 and details coeﬃcients of levels 7, 6 and 5 were selected as

characteristic wavelet. To justify the selection of these coeﬃcients was used

a criterion of informativeness based on accumulated Shannon entropy [13]

with a threshold greater than 60 %.

To carry out the classiﬁcation tasks, it was used three diﬀerent classiﬁers:

ak-NN, a linear discriminant (LDC), and a support vector machine (SVM),

in order to compare the performance of them and select the one that oﬀers

higher performance classiﬁcation. In validation step was used a partition

of 70 % for the training group and 30 % for the test group. The testings

produced the following result:

Figure 4 displays the performance of a k-NN classiﬁer in continuous

repetition to show the stability of the proposed methodology. It can be

observed, that all values of the classiﬁcation performance is above 80 % and

maintain an acceptable standard deviation.

Figure 4 shows the performance obtained by the feature subsets obtained

after selection algorithm SFFS, namely: 1.Performance for the ﬁrst selected

characteristic, 2.Performance for the subset formed by the ﬁrst and second

selected features, and so on.

Table 1 shows the accuracy, speciﬁcity and sensitivity of each group of

features. In table can be seen that from original set of features X, the morp-

hological characteristics are the major contributors in the performance of

the classiﬁer. This same condition is also evident in percentages achieved by

this subset of features for the sensitivity and speciﬁcity.

8

1 2 3 4 5 6 7 8 9 10

0

10

20

30

40

50

60

70

80

90

100

Iterations

classification Performance

Figura 2: Stability of methodology with respect to iterations

SubsetsofFeatures

ClassificationPerformance

Figura 3: Classiﬁcation performance for each feature subset

Feature Accuracy ( %) Sp eciﬁcity ( %) Sensitivity( %)

Morphological 85,35 ±3,9 85,00 85,83

Spectral 63,92 ±8,6 73,12 51,66

Wavelet 67,85 ±8,5 73,12 60,83

Cuadro 1: Classiﬁcation performance for each group of features

9

Table 2 relates the classiﬁcation accuracy of three classiﬁers mentioned

above. It is noted that a simple parametric classiﬁer as the k-NN can achieve

an optimum performance. Moreover, it is observed that the highest rates of

speciﬁcity and sensitivity were achieved also with the k-NN.

Classiﬁer Accuracy ( %) Speciﬁcity ( %) Sensitivity( %)

k-NN 86,07 ±3,5 85,00 87,50

LDC 73,92 ±7,1 82,5 62,50

SVM 78,57 ±4,7 81,25 75,00

Cuadro 2: Classiﬁcation performance for each group of features

Table 3 shows the classiﬁcation rates achieved without preclustering over

Xapplied in preprocessing stage. In comparison with table 2, there is a

considerable drop in the percentage of classiﬁcation, which can be attributed

to the presence of outliers or heterogeneous data.

Classiﬁer Accuracy ( %) Speciﬁcity ( %) Sensitivity( %)

k-NN 56,04 ±5,6 62,91 49,16

LDC 48,54 ±6,5 42,08 55,00

SVM 48,33 ±4,2 24,58 72,08

Cuadro 3: Classiﬁcation performance for the proposed methodology

Given the low classiﬁcation accuracy obtained in this test, it has been

proved the need to make a preclusting in preprocessing. These results also

show the low reliability of the labels given by medical specialists.

5. Conclusion

Because of the nature of ERP signals and the low reliability of labeling

given by specialists, the identiﬁcation of ADHD represents a diﬃcult task

for both medicine and pattern recognition. The design of classiﬁcation sys-

tems for discrimination between ADHD and normal signals require a new

data representation since signal samples cannot be enough to obtain a good

classes separability.

In this work, to try overcoming this problem, a methodology of data

re-characterization is proposed. This methodology is mainly composed by

two stages: truncated liner projection and clustering-based characterization.

They are done to achieve a good data representation in terms of compactness

and the distance-based representation from obtained clusters that improves

the classiﬁcation performance, respectively. These two stages are comple-

mentary and coherent at each other.

10

In addition, a pre-clustering stage was introduced, which allows that the

classiﬁers perform better because supposed outlier observations are discar-

ded.

Acknowledgment

Authors would like to thank COLCIENCIAS and Universidad Nacional de

Colombia for the ﬁnancial support on the projects “Identiﬁcacin Automtica

del Trastorno por Dﬁcit de Atencin y /o Hiperactividad sobre registros de

Potenciales Evocados Cognitivos” and “Sistema de Diagnstico Asistido para

la Identiﬁcacin de TDAH sobre Registros de Potenciales Evocados Cogniti-

vos”, respectively.

Referencias

[1] M. A. Idiazbal, A. Palencia-Taboada, J. Sangorrn, J. Espadaler-

Gamissans, Potenciales evocados cognitivos en el trastorno por dﬁcit

de atencin con hiperactividad, Rev Neurol 34 (2002) 301–305.

[2] R. A. Barkley, Attention deﬁcit hyperactivity disorder: a handbook for

diagnosis and treatment, Guilford Press, New York, 3rd Edition (2005).

[3] V. Bostanov, Data sets Ib and IIb: Feature extraction from eventrelated

brain potentials with the continuous wavelet transform and the t-value

scalogram, IEEE Transactions on Biomedical Engineering 51 (6) (2004)

10571061.

[4] S. H. Patel, P. N. Azzam, Characterization of n200 and p300: Selected

studies of the event-related potentials, International Journal of Medical

Sciences 2 (4) (2005) 147–154.

[5] I. Kalatzis, N. Piliouras, E. Ventouras, C. Papageorgiou, A. Rabavilas,

D. Cavouras, Design and implementation of an SVM-based computer

classiﬁcation system for discriminating depressive patients from healthy

controls using the P600 component of ERP signals, Computer Methods

and Programs in Biomedicine 75 (2004) 11–22.

[6] Y. S. X., S. Jianbo, Multiclass spectral clustering, in: ICCV ’03: Procee-

dings of the Ninth IEEE International Conference on Computer Vision,

IEEE Computer Society, Washington, DC, USA, 2003, p. 313.

11

[7] L. Zelnik-manor, P. Perona, Self-tuning spectral clustering, in: Advan-

ces in Neural Information Processing Systems 17, MIT Press, 2004, pp.

1601–1608.

[8] P. Hansen, N. Mladenovic, E. Commerciales, J-means: A new local

search heuristic for minimum sum-of-squares clustering, Pattern Re-

cognition 34 (2) (2001) 405–413.

[9] E. Delgado-Trejos, A. Perera-Lluna, M. Vallverd-Ferrer, P. Caminal-

Magrans, G. Castellanos-Domnguez, Dimensionality reduction oriented

toward the feature visualization for ischemia detection, in: IEEE Tran-

sactions on Information Technology in Biomedicine VOL. 13, NO. 4,

JULY 2009, Vol. 13, 2009.

[10] V. Abootalebi, M. H. Moradi, M. A. Khalilzadeh, A new approach for

EEG feature extraction in P300-based lie detection, Computer methods

and programs in biomedicine 94 (2009) 4857.

[11] S. Murillo-Rendn, G. castellanos Domnguez, Construccin, limpieza y

depuracin previa al anlisis estadstico de bases de datos., in: XV SIM-

POSIO DE TRATAMIENTO DE SENALES, IMAGENES Y VISI ON

ARTIFICIAL - STSIVA 2010, 2010.

[12] D. Ververidis, C. Kotropoulos, Fast and accurate sequential ﬂoating

forward feature selection with the bayes classiﬁer applied to speech

emotion recognition, Signal Processing 88 (12) (2008) 2956–2970.

[13] R. Coifman, M. Wickerhauser, Entropy-based algorithms for best basis

selection, Information Theory, IEEE Transactions on 38 (2) (1992) 713

–718. doi:10.1109/18.119732.

12