Content uploaded by Titouan Parcollet

Author content

All content in this area was uploaded by Titouan Parcollet on Oct 31, 2018

Content may be subject to copyright.

QUATERNION CONVOLUTIONAL NEURAL NETWORKS

FOR HETEROGENEOUS IMAGE PROCESSING

Titouan Parcollet1,2, Mohamed Morchid1, Georges Linarès1

1Université d’Avignon, LIA, France

2Orkis, Aix en provence, France

ABSTRACT

Convolutional neural networks (CNN) have recently

achieved state-of-the-art results in various applications. In

the case of image recognition, an ideal model has to learn in-

dependently of the training data, both local dependencies be-

tween the three components (R,G,B) of a pixel, and the global

relations describing edges or shapes, making it efﬁcient with

small or heterogeneous datasets. Quaternion-valued convo-

lutional neural networks (QCNN) solved this problematic

by introducing multidimensional algebra to CNN. This pa-

per proposes to explore the fundamental reason of the suc-

cess of QCNN over CNN, by investigating the impact of

the Hamilton product on a color image reconstruction task

performed from a gray-scale only training. By learning inde-

pendently both internal and external relations and with less

parameters than real valued convolutional encoder-decoder

(CAE), quaternion convolutional encoder-decoders (QCAE)

perfectly reconstructed unseen color images while CAE pro-

duced worst and gray-scale versions.

Index Terms—Quaternion convolutional encoder-decoder,

convolutional neural networks, heterogeneous image process-

ing

1. INTRODUCTION

Neural network models are at the core of modern image

recognition methods. Among these models, convolutional

neural networks [1](CNN) have been developed to consider

both basic and complex patterns in images, and achieved top

of the line results in numerous challenges [2]. Nonetheless,

in the speciﬁc case of image recognition, a good model has

to efﬁciently encode local relations within the input features,

such as between the Red, Green, and Blue (R,G,B) channels

of a single pixel, as well as structural relations, such as those

describing edges or shapes composed by groups of pixels.

In particular, traditional real-valued CNNs consider pixels as

three different and separated values (R, G, B), while a more

natural representation is to process a pixel as a single multi-

dimensional entity. More precisely, both internal and global

hidden relations are considered at the same level during the

training of CNNs.

Thereby, and strong of many applications [3, 4, 5], quater-

nion neural networks [6, 7, 8] (QNN) have been proposed to

encapsulate multidimensional input features. Quaternions are

hyper-complex numbers that contain a real and three separate

imaginary components, ﬁtting perfectly to three and four di-

mensional feature vectors, such as for image processing. In-

deed, the three components (R,G,B) of a given pixel are em-

bedded in a quaternion, to create and process pixels as enti-

ties. With the purpose to solve the above described problem of

local and global dependencies, deep quaternion convolutional

neural networks [9, 10, 11] (QCNN) have been proposed. In

the previous works, better image classiﬁcation results than

real-valued CNN are obtained with smaller neural networks

in term of number of parameters. The authors claim that such

better performances are due to the speciﬁc quaternion algebra,

alongside with the natural multidimensional representation of

a pixel. Nonetheless, and despite promising results, no clear

intuitions of QCNN performances in image recognition have

been demonstrated yet. Moreover, these studies employ color

images for training and validation sub-processes.

Therefore, the paper proposes: 1) to explore the impact of

the Hamilton product (Section 2.1), which is at the heart of

the better learning and representation abilities of QNN; 2) to

show that quaternion-valued neural networks are able to per-

fectly learn color features dependencies (R,G,B). Quaternion

and real-valued neural networks are therefore compared on a

gray-scale to color image task that highlights the capability

of a model to learn both internal (i.e. the relations that ex-

ist inside a pixel) and external relations of an image. In this

extend, a quaternion convolutional encoder-decoder (QCAE)

(Section 3) 1and a real-valued convolutional encoder-decoder

[12] (CAE) are trained to reconstruct a unique gray-scale im-

age from the KODAK PhotoCD dataset (Section 4.1). During

the validation process, an unseen color image is presented to

both models, and reconstructed pictures are compared visu-

ally and with the peak signal to noise ratio (PSNR) as well as

the structural similarity (SSIM) metrics (Section 4.3). To vali-

date the learning of internal dependencies, these models must

reconstruct the color image without prior information about

1Code is available at https://github.com/Orkis-Research/Pytorch-

Quaternion-Neural-Networks

the color space given from the training phase. The experi-

ments show that QCAE succeeds to produce an almost per-

fect copy of the testing image, while the CAE fails, by recon-

structing a slightly worst and black and white version. Such

behavior makes quaternion-valued models a better ﬁt to image

recognition in heterogeneous conditions. Indeed, quaternion-

valued are less harmed by smaller and heterogeneous data,

due to their ability to dissociate internal and global dependen-

cies trough the Hamilton product, and convolutional process

respectively. Finally, it is worth noticing that these perfor-

mances are observed with a reduction of the number of neural

parameters of four times for QCAE compared to CAE.

2. QUATERNION ALGEBRA

The quaternion algebra Hdeﬁnes operations between quater-

nion numbers. A quaternion Q is an extension of a complex

number deﬁned in a four dimensional space as:

Q=r1 + xi+yj+zk,(1)

where r,x,y, and zare real numbers, and 1,i,j, and kare

the quaternion unit basis. In a quaternion, ris the real part,

while xi+yj+zkwith i2=j2=k2=ijk =−1is the

imaginary part, or the vector part. Such a deﬁnition can be

used to describe spatial rotations.

2.1. Hamilton product

The Hamilton product (⊗) is used in QNN to remplace the

standard real-valued dot product, and to perform transforma-

tions between two quaternions Q1and Q2following:

Q1⊗Q2=(r1r2−x1x2−y1y2−z1z2)+

(r1x2+x1r2+y1z2−z1y2)i+

(r1y2−x1z2+y1r2+z1x2)j+

(r1z2+x1y2−y1x2+z1r2)k.(2)

The Hamilton product allows quaternion neural network to

capture internal latent relations within the features of a quater-

nion (see Figure 1). In the case of a quaternion-valued neu-

ral network, the quaternion-weight components are shared

through multiple quaternion-input parts during the Hamilton

product , creating relations within the elements. Indeed, Fig-

ure 1 shows that, in a real-valued neural network, the multi-

ple weights required to code latent relations within a feature

are considered at the same level as for learning global rela-

tions between different features, while the quaternion weight

wcodes these internal relations within a unique quaternion

Qout during the Hamilton product (right).

3. QUATERNION CONVOLUTIONAL

ENCODER-DECODER

The QCAE is an extension of the well-known real-valued

convolutional networks (CNN) [2] and convolutional encoder-

decoder [13] to quaternion numbers. Encoder-decoder mod-

els are simple unsupervised structures that aim to reconstruct

the input feature at the output [12]. In a CAE or QCAE,

encoding dense layers are simply replaced with convolutional

ones, while decoding dense layers are either changed to trans-

posed or upsampled convolutional layers [14]. In this extend,

let us recall the basics of the quaternion-valued convolution

process [10, 9]. The latter operation is performed with the

real-number matrices representation of quaternions. There-

fore, a traditional 1Dconvolutional layer, with a kernel that

contains K×Kfeature maps, is split into 4 parts: the ﬁrst

part equal to r, the second one to xi, the third one to yjand

the last one to zkof a quaternion Q=r1 + xi+yj+zk.

The backpropagation is ensured by differentiable cost and

activation functions that have already been investigated for

quaternions in [15] and [16]. As a result, the so-called split

approach [8, 6, 9, 17] is used as a quaternion equivalence

of real-valued activation functions. Then, let γl

ab and Sl

ab ,

be the quaternion output and the pre-activation quaternion

output at layer land at the indexes (a, b)of the new feature

map, and wthe quaternion-valued weight ﬁlter map of size

K×K. A formal deﬁnition of the convolution process is:

γl

ab =α(Sl

ab),(3)

with

Sl

ab =

K−1

X

c=0

K−1

X

d=0

wl⊗γl−1

(a+c)(b+d),(4)

where αis a quaternion split activation function [8, 6, 9,

17]. The output layer of a quaternion neural network is com-

monly either quaternion-valued such as for quaternion ap-

proximation [7], or real-valued to obtains a posterior distribu-

tion based on a softmax function following the split approach.

Indeed, target classes are often expressed as real numbers.

4. EXPERIMENTS AND RESULTS

This section details the experiments (Section 4.1), the models

architectures (Section 4.2), and the results (Section 4.3) ob-

tained with both QCAE and CAE on a gray to color task with

the KODAK PhotoCD dataset.

4.1. From gray-scale to the color space

We propose ﬁrst to highlight the ability of a model to learn the

internal relations that compose pixels (i.e. the color space),

and ensures the robustness of the model in heterogeneous

training/validation conditions. In this extend, models are

trained to compress and reproduce a unique gray-scale im-

age in an encoder-decoder fashion, and are then fed with

two different color images at validation time. Models are

expected to reproduce the exact same colors than the orig-

inal test samples. Experiments are based on the KODAK

Fig. 1. Illustration of the impact of the Hamilton product in a quaternion-valued neural network layer, compared to a traditional

real-valued neural network layer

PhotoCD data-set 2. A random image (See Figure 2) is

converted to gray-scale following the basic luma formula

[18] and used as the training sample, while the others orig-

inal color images are used as a validation subset. There-

fore, training is performed on a single gray-scale image of

512 ×768 pixels with the gray value of a given pixel px,y

repeated three times to compose a quaternion Q(px,y) =

0 + GS(px,y)i+GS (px,y )j+GS(px,y )k. For a fair

comparison, the gray value is also concatenated three times

for each pixel in the real-valued CNN. Finally, the quaternion

Q(px,y) = 0 + R(px,y )i+G(px,y )j+B(px,y )kbased

on color images is composed and processed at validation

time, while R, G, B components are concatenated for CNN.

Reconstructed pictures are evaluated visually and with the

peak signal to noise ratio (PSNR) [19] as well as structural

similarity (SSIM)[20] metrics.

4.2. Models architectures

QCAE and CAE have the same topology. It is worth noticing

that the number of output feature maps is four times larger

in the QCAE due to the quaternion convolution, meaning 8

quaternion-valued feature maps correspond to 32 real-valued

ones. Therefore, each model has two convolutional encoding

layers and transposed convolutional decoding layers that deal

with the same dimensions, but with different internal sizes.

Indeed quaternion features maps are of size 8and 16 to deal

with an equivalently size of 32 and 64 for the CAE. Such di-

mensions ensure an encoding dimension slightly smaller than

2http://r0k.us/graphics/kodak

the original picture size. Kernel size and strides are set to

3and 2across all the layers respectively. Training is per-

formed during 3,000 epochs with the Adam optimizer [21],

vanilla hyper-parameters and a learning rate of 5e−4. The

hardtanh [22] activation function is used in both convolutional

and transposed convolutional layers. Finally, quaternion pa-

rameters are initialized following the proposal of [23].

4.3. Results and discussions

The results are reported in Figure 2. It is ﬁrst important to

notice that quaternion-valued CAE produced almost perfect

color images w.r.t. to the test, while CAE completely failed

to capture colors by outputting a black and white version. As

motivated in Section 2, the quaternion representation along-

side with the Hamilton product force the QCAE to consider

and preserve the internal latent relations between the compo-

nents of a quaternion (i.e. a pixel). Consequently, QCAE eas-

ily captures the color space from a gray-scale image since it

learns to produce the exact same values from the input at the

output, while real-valued CAE learns a gray-scale mapping

by generating three identical components.

Other numerical measures are obtained based on the

PSNR and SSIM of the reconstructed pictures. Due to the

fact that CAE fails to learn colors, we propose to compare

CAE results to the gray-scale equivalent of the test pic-

tures. QCAE results are compared to the true color images.

Consequently, we can measure how good each model is to

reconstruct testing images, without being biased by the fact

that CAE fails to learn colors. QCAE obtains a PSNR of

31.68dB and 28.06dB compared to 29.95dB and 27.01dB

Fig. 2. Results on the gray-scale to color task with the KODAK data-set. A gray-scale training picture (Train) and two coloured

original test images (Original Test) are randomly selected on the KODAK data-set and reproduced by both QCAE and CAE.

obtained with the CAE for the parrots and women images

respectively. Nonetheless, while PSNR measures the amount

of noise contained in an image, SSIM reports the structural

and visual correlation of two pictures. SSIM of 0.96 and 0.93

are reported for QCAE compared to 0.87 and 0.86 for the

CAE on the parrots and women images respectively. QCAE

offered a better reconstructed image quality in both PSNR

and SSIM metrics, even considering the inability of CAE to

learn the color space. Moreover, the QCAE is composed of

6.4K parameters compared to 25K for the CAE. It is easily

explained by the quaternion algebra. In the case of a dense

layer with 1,024 input values and 1,024 hidden units, a real-

valued model will have 1,0242≈1M parameters, while to

maintain equal input and output nodes (1,024) the quaternion

equivalent has 256 quaternions inputs and 256 quaternion-

valued hidden units. Therefore the number of parameters for

the quaternion model is 2562×4≈0.26M.

Discussions. In the one hand, the size reduction offered by

QNN turns out to produce better results with an higher gen-

eralization capacity and may have other advantages such as

a smallest memory footprint while saving models. Then, the

natural internal relation representation induced by the Hamil-

ton product, alongside with the convolution process provide

an important step toward better performances of models that

operate in heterogeneous contexts, or with very small data-

sets. The small number of neurons allows the QCAE to ob-

tain “robust” and “compact” memory that build a robust hid-

den representations of the image content in the latent space.

Indeed, QCAE are not altered by heterogeneous color spaces

(e.g. corpus of boats with a predominating blue spectrum),

and are able to learn internal relations with very few exam-

ples trough the Hamilton product.

5. CONCLUSION

This paper proposes to clarify the recent better performances

observed on image recognition with quaternion-valued neu-

ral networks, through a investigation of the impact of the

Hamilton product. The conduced experiments demonstrate

that quaternion convolutional encoder-decoders are able to

perfectly learn the color-space with a training performed on

a unique gray-scale image, while real-valued CAE fail, prov-

ing that the Hamilton product allows QNN to encode local

dependencies such as the RGB relation of a pixel. Moreover,

QCAE produce better quality reconstructions with respect to

the PSNR and SSIM metrics than CAE, even with consid-

ering the unability of CAE to learn colors. Moreover, the

quaternion representation offers more compact and expres-

sive models. Thereby, the experiments have validated the

initial intuition that the Hamilton product, alongside with

the convolution process, allow QCAE to better separate both

local and global dependencies of color images. These results

are an important step forward for a more robust image recog-

nition system in heterogeneous conditions. Future work will

attempts to introduce the efﬁcient quaternion representation

to image compression and image recognition.

6. REFERENCES

[1] Yoon Kim, “Convolutional neural networks for sentence

classiﬁcation,” arXiv preprint arXiv:1408.5882, 2014.

[2] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian

Sun, “Deep residual learning for image recognition,” in

Proceedings of the IEEE conference on computer vision

and pattern recognition, 2016, pp. 770–778.

[3] Stephen John Sangwine, “Fourier transforms of colour

images using quaternion or hypercomplex, numbers,”

Electronics letters, vol. 32, no. 21, pp. 1979–1980,

1996.

[4] Soo-Chang Pei and Ching-Min Cheng, “Color im-

age processing by using binary quaternion-moment-

preserving thresholding technique,” IEEE Transactions

on Image Processing, vol. 8, no. 5, pp. 614–628, 1999.

[5] Nicholas A Aspragathos and John K Dimitros, “A com-

parative study of three methods for robot kinematics,”

Systems, Man, and Cybernetics, Part B: Cybernetics,

IEEE Transactions on, vol. 28, no. 2, pp. 135–145,

1998.

[6] Paolo Arena, Luigi Fortuna, Luigi Occhipinti, and

Maria Gabriella Xibilia, “Neural networks for

quaternion-valued function approximation,” in Circuits

and Systems, ISCAS’94., IEEE International Sympo-

sium on. IEEE, 1994, vol. 6, pp. 307–310.

[7] Paolo Arena, Luigi Fortuna, Giovanni Muscato, and

Maria Gabriella Xibilia, “Multilayer perceptrons to ap-

proximate quaternion valued functions,” Neural Net-

works, vol. 10, no. 2, pp. 335–342, 1997.

[8] Teijiro Isokawa, Tomoaki Kusakabe, Nobuyuki Mat-

sui, and Ferdinand Peper, “Quaternion neural network

and its application,” in International Conference on

Knowledge-Based and Intelligent Information and En-

gineering Systems. Springer, 2003, pp. 318–324.

[9] Titouan Parcollet, Ying Zhang, Mohamed Morchid,

Chiheb Trabelsi, Georges Linarès, Renato de Mori, and

Yoshua Bengio, “Quaternion convolutional neural net-

works for end-to-end automatic speech recognition,” in

Interspeech 2018, 19th Annual Conference of the In-

ternational Speech Communication Association, Hyder-

abad, India, 2-6 September 2018., 2018, pp. 22–26.

[10] Chase J Gaudet and Anthony S Maida, “Deep quater-

nion networks,” in 2018 International Joint Conference

on Neural Networks (IJCNN). IEEE, 2018, pp. 1–8.

[11] Xuanyu Zhu, Yi Xu, Hongteng Xu, and Changjian

Chen, “Quaternion convolutional neural networks,” in

Proceedings of the European Conference on Computer

Vision (ECCV), 2018, pp. 631–647.

[12] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and

Pierre-Antoine Manzagol, “Extracting and composing

robust features with denoising autoencoders,” in Pro-

ceedings of the 25th international conference on Ma-

chine learning. ACM, 2008, pp. 1096–1103.

[13] Lucas Theis, Wenzhe Shi, Andrew Cunningham,

and Ferenc Huszár, “Lossy image compression

with compressive autoencoders,” arXiv preprint

arXiv:1703.00395, 2017.

[14] Vincent Dumoulin and Francesco Visin, “A guide

to convolution arithmetic for deep learning,” arXiv

preprint arXiv:1603.07285, 2016.

[15] D Xu, L Zhang, and H Zhang, “Learning alogrithms in

quaternion neural networks using ghr calculus,” Neural

Network World, vol. 27, no. 3, pp. 271, 2017.

[16] Tohru Nitta, “A quaternary version of the back-

propagation algorithm,” in Neural Networks, 1995.

Proceedings., IEEE International Conference on. IEEE,

1995, vol. 5, pp. 2753–2756.

[17] Titouan Parcollet, Mohamed Morchid, Pierre-Michel

Bousquet, Richard Dufour, Georges Linarès, and Re-

nato De Mori, “Quaternion neural networks for spoken

language understanding,” in Spoken Language Technol-

ogy Workshop (SLT), 2016 IEEE. IEEE, 2016, pp. 362–

368.

[18] Eric Hamilton, “Jpeg ﬁle interchange format,” 2004.

[19] Deepak S Turaga, Yingwei Chen, and Jorge Caviedes,

“No reference psnr estimation for compressed pictures,”

Signal Processing: Image Communication, vol. 19, no.

2, pp. 173–184, 2004.

[20] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P

Simoncelli, “Image quality assessment: from error vis-

ibility to structural similarity,” IEEE transactions on

image processing, vol. 13, no. 4, pp. 600–612, 2004.

[21] Diederik Kingma and Jimmy Ba, “Adam: A

method for stochastic optimization,” arXiv preprint

arXiv:1412.6980, 2014.

[22] Ronan Collobert, “Large scale machine learning,” 2004.

[23] Titouan Parcollet, Mirco Ravanelli, Mohamed Morchid,

Georges Linarès, Renato De Mori, and Yoshua Bengio,

“Quaternion recurrent neural networks,” 2018.