Conference PaperPDF Available

Texture synthesis using convolutional neural networks with long-range consistency and spectral constraints

Authors:

Abstract and Figures

Procedural texture generation enables the creation of more rich and detailed virtual environments without the help of an artist. However, finding a flexible generative model of real world textures remains an open problem. We present a novel Convolutional Neural Network based texture model consisting of two summary statistics (the Gramian and Translation Gramian matrices), as well as spectral constraints. We investigate the Fourier Transform or Window Fourier Transform in applying spectral constraints, and find that the Window Fourier Transform improved the quality of the generated textures. We demonstrate the efficacy of our system by comparing generated output with that of related state of the art systems.
Content may be subject to copyright.
Texture Synthesis Using Convolutional Neural
Networks With Long-Range Consistency and
Spectral Constraints
Shaun Schreiber
Division of Computer Science
Stellenbosch University
Stellenbosch, South Africa
Email: shaunschreiber@ml.sun.ac.za
Jaco Geldenhuys
Division of Computer Science
Stellenbosch University
Stellenbosch, South Africa
Email: jaco@cs.sun.ac.za
Hendrik de Villiers
Food and Biobased Research
Wageningen UR
Wageningen, The Netherlands
Email: hendrik.devilliers@wur.nl
Abstract—Procedural texture generation enables the creation
of more rich and detailed virtual environments without the help
of an artist. However, finding a flexible generative model of real
world textures remains an open problem. We present a novel
Convolutional Neural Network based texture model consisting of
two summary statistics (the Gramian and Translation Gramian
matrices), as well as spectral constraints. We investigate the
Fourier Transform or Window Fourier Transform in applying
spectral constraints, and find that the Window Fourier Transform
improved the quality of the generated textures. We demonstrate
the efficacy of our system by comparing generated output with
that of related state of the art systems.
I. INT ROD UC TI ON
Creating detailed texture maps for virtual environments is
often a time-consuming process where artists have to manually
create all content, even when one is attempting to create
variations of existing textures. However, advancements in the
field of textures synthesis have made it possible to near-
automate this process by using parametric or non-parametric
techniques to synthesize textures.
One model that has shown particular promise for this use-
case is found in Gatys [1], where they demonstrate the use of
a convolutional neural network in conjunction with a summary
statistic to represent a texture. This model produces adequate
results with stochastic and irregular textures but struggled
to synthesize regular textures. [2] and [3] introduced two
key improvements that allowed the network of Gatys [1] to
synthesize regular and near-regular textures with some degree
of success, however, they still struggled in some cases. One
such case is when two different textures merge, such as where
sand meets water.
In this work, we further expand the capabilities of the
system proposed by Gatys [1]. Firstly, we introduce the use
of the Windowed Fourier Transform instead of the Fourier
Transformed used in [3]. The idea here is to increase the
spatial resolution to enable regions with different underlying
texture models to be treated partially independent. We further
expand this method by introducing the long-range consistency
approach described in [2]. In the following section, we provide
a review of related literature and methods, followed by a
detailing of our approach and results.
II. TE XT UR E SY NT HE SI S
A. Convolutional Neural Networks (CNNs)
A Convolutional Neural Network is a type of feed-forward
neural network that predominantly consists of convolutional,
pooling, fully connected and softmax layers. The basic struc-
ture of a neural network is depicted in Figure 1. Because
only convolutional and pooling layers are employed during
texture generation, we restrict our attention to these layers
in the following outline of layer functionality. For a detailed
explanation of convolutional neural networks, see [4].
a) Convolutional Layer: Convolutional layers are a set of
convolutional operators where the filters are trainable through
backpropagation. The weights of each filter are shared and
are ideally trained to be sensitive to informative features in
the input. An important consequence of weight sharing is
that it reduces the dimensionality of the problem, helping to
prevent overfitting and enables the network to be trained more
effectively. The output of these filters are commonly referred
to as feature maps.
b) Pooling Layer: Pooling layers down-sample their re-
spective inputs, which reduces the number of parameters in
the network and this down-sampling contributes to high-level
long-range reasoning. This is usually achieved by applying an
averaging or max filter on the input. A common filter size is
2×2with a stride of two, which reduces the input by a factor
of four.
The network used in this paper for texture synthesis is the
VGG-19 [6] network that consists of 16 convolutional and
5 pooling layers, excluding the fully connected and softmax
layers. This network architecture was proposed by Simonyan
[6] for image recognition.
B. Texture Model
The texture model used in this paper was introduced by
Gatys [1]. The underlying idea is to pass a texture Athrough a
CNN which populates the feature maps for each convolutional
Accepted for publication in the proceedings of the 2016 PRASA-RobMech International Conference
DOI: 10.1109/RoboMech.2016.7813173
The final copy of this paper is available on IEEE Xplore at http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7813173&isnumber=7813131 © 2016 IEEE.
Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing
this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted
component of this work in other works.
Fig. 1. This image depicts the basic structure of a CNN [5].
layer. A spatial summary statistic is then calculated over the
feature maps for each layer. It is the combination of these
summary statistics and the extensive feature space of the VGG-
19 network that form the basis of the texture model.
The summary statistic used by Gatys [1], the Gram matrix,
calculates the correlation between feature map activations. The
Gram matrix for each layer is given by
Gl
ij =X
x
Fl
ix ·Fl
jx (1)
where Gl
ij is the Gram matrix value at position i, j of layer l.
Fl
ix refers to the value at index xof the ith feature map
on layer l. Note that calculating the inner product of two
feature maps discards any spatial information. This proves
to be an important property as textures are by definition
stationary1. Also, two textures are seen as the same if their
texture descriptions are within a defined distance δof one
another.
C. Texture Synthesis Process
This section provides a brief introduction to the core texture
synthesis process introduced by Gatys [1]. The synthesis
process starts by initializing two textures: a reference texture
Aand a texture Bwith populated by Gaussian noise.
After calculating the initial descriptions of both textures,
gradient descend is applied on Bto measure and minimize the
distance between the descriptions. To use gradient descent, a
distance function needs to be defined.
Dl
cnn =1
4N2
lM2
lX
iX
j
Gl
ij ˆ
Gl
ij 2
.(2)
Here Dl
cnn denotes the mean square distance at layer l,Nthe
number of feature maps at layer land Mthe size of a feature
map at layer l.Grefers to the Gram matrix of the reference
texture and ˆ
Grefers to the Gram matrix of the texture that is
being generated. The total distance is
Dcnn(I , ˆ
I) = X
l
wlDl
cnn (3)
1Refer to [7] for an in-depth explanation of Textures.
where Iand ˆ
Irefer to the reference and generated images
respectively. The weight associated with each layer wlis pre-
set and represents its contribution to the distance. Note that
the total distance between Band Ais minimized by only
changing B’s description.
D. Translation-Gramians Extension
The initial texture model proposed by Gatys, delivered
convincing results when the underling structure was irregular
or stochastic. However, the model struggled to capture textures
that contained long-range relationships. An example of this is
shown in Figure 2.
This was addressed to some extent by Berger [2], where
they added an additional summary statistic referred to as a
transformed Gramian. The transformed Gramian represents
the co-occurrences between feature maps Fand feature maps
T(F), where Tdenotes any spatial transformation.
Gl
x,δ,i,j =
Nδ
X
k=0
Tx,+δ(Fl
i)k·Tx,δ(Fl
i)k(4)
Various spatial transforms are presented by Berger. The one
shown in equation 4 denoted by Tx,+δcan be viewed as a
translation by δalong the x-axis in a positive or negative
direction. By using this transform, the model is now capable
of capturing averaged correlations between local features at
positions (i, j)and (i, j +δ).
To incorporate this summary statistic within the initial
model, a distance function is required. The distance between
transformed Gramians at a certain level lis defined as:
Dl
cc,δ =1
2
ˆ
Gl
x,δ Gl
x,δ
2
F+
ˆ
Gl
y,δ Gl
y,δ
2
F(5)
and the total distance between texture representations as
Dcc(I , ˆ
I) = X
l
wl
1Dl
cnn +wl
2Dl
cc,δ (6)
where wl
1and wl
2denote the contribution coefficients for each
summary statistic at layer l. Example output of this approach
is seen in figure 2d.
(a) Original (b) Gatys (c) FT (d) WFT
Fig. 2. Generated textures
E. Fourier Transform Extension
The model proposed by Gatys [1] struggled to represent
regular or near-regular textures. [3] addressed this issue by
applying constraints on the frequency domain of the synthe-
sized image. This was achieved by first defining a set lswhich
contains images with the same spectral amplitudes as I0while
allowing changes in the spectral phase.
ls=nI\u(m) = e(m)u0(m)mo.(7)
Here u0denotes the Fourier transform of I0and u(m)the
Fourier transform of I0with its spectral phase shifted by
ϕ(m)[0,2π].ˆ
Iis then projected onto the closest element
in lsof I. The projection is defined as
˜
I=F1
x
z }| {
F(ˆ
I)·F(I)
F(ˆ
I)·F(I)
y
z}|{
F(I)
(8)
where Fand F1denotes the Fourier Transform and Inverse
Fourier Transform of an image respectively. Note all opera-
tions, excluding Fand F1, are element-wise. Further, it is
important to note that component xcalculates the phase by
which component yneeds to shift. Only y’s phase is affected
because xis normalized by the factor
F(ˆ
I)·F(I)
. The
derivation of equation 8 can be found in [8]. To incorporate
this constraint in the existing model they introduced a gradient
4spec =ˆ
I˜
I(9)
and loss function
Dspec =1
2(k4speck)2.(10)
These functions are then added as additional terms to the
original gradient and loss functions defined in §II-C
Dcspec(I , ˆ
I) = Dcnn +βspecDspec (11)
where βspec is the contribution coefficient of the spectral
constraint. Textures synthesized using this approach are shown
in Figure 2. Applying the spectral constraint on the synthesis
process heightens its ability to capture more regular structures.
However, it struggles to recognise the brick colour associations
as the beige is smudged over the image.
III. WINDOWED FOURIER TRANSFORM EXTENSION
In this section we extend the approach proposed by [1]
which is briefly described in §II-E. We propose using a
Windowed Fourier Transform instead of a Fourier Transform.
This increases the resolution by restricting the region over
which spectral constraints operate, which enables regions with
different underlying texture models to be treated partially
independent.2However, the information between windows is
lost. To counteract the loss in information, a sliding window is
used. This introduces two new variables to the system: window
size and step size. The window size can be viewed as the size
of the regions which are partially isolated from each other, and
the step size the degree of interaction between regions (smaller
step sizes imply a bigger overlap between windows, which
imply stronger interactions between neighbouring windows).
A drawback is that choosing these variables requires some
element of manual tuning during the generation process.
An important property of the Windowed Fourier Transform
is that it can be used as a Fourier Transform if the window and
step sizes are set equal to the dimensions of source image. This
implies that there is no situation where only using the Fourier
Transform is advantageous.
The reasoning behind introducing the Windowed Fourier
Transform, is to allow better support for images with more
complex structures: textures that contain both regular and
stochastic elements. This is achieved to some degree due to the
reduction in low frequencies or long term relationships in the
image. Examples of this would be sand merging with water or
rock merging with ground. Both of these examples are shown
in Figure 3. To incorporate this novel feature, equation 9 of
the original algorithm is redefined as
4avg =PN
iPM
jˆ
Iij ˜
Iij
PN
iPM
jJij
(12)
where 4avg denotes the averaged difference between each
of the windowed sections. Nand Mdenotes the number of
windows per row and column respectively. ˆ
Iand ˜
Idenote the
synthesized image and the synthesized image projected onto
Iwith equation 8, respectively. The ij subscript refers to the
sub-images masked out by window ij.Jis an all-ones matrix
with the same dimensions as ˆ
I. The denominator is responsible
2This can be viewed as a high-pass filter.
TABLE I
MOD EL WEI GH TS
Extensions wl
1wl
2βspec βavg
Dcnn 109
Dcc 105106
Dcspec 109102
Dcavg 109102
Dca 105106101
for averaging the pixel values while taking into account that
windows overlap.
Generating textures proceeds as in the original approach,
with the old gradient 4spec replaced with 4avg . Textures
synthesized utilizing this approach are shown in Figure 2.
Comparing the images generated by the Fourier and win-
dowed Fourier Transforms, the most striking observation is the
smudge affect visible in FT image has be reduced when using
the WFT. Furthermore, the WFT was able to capture the shift
in brightness from the center to the edges of the image. This
is attributed to using the WFT as the windows apply more
localized constraints on the different sections of the image.
IV. HYBRID EXTENSION
We propose a method that combines the spectral constraints
defined in §II-E with the long-range consistency approach
in §II-D. The spectral constraints improve the model’s abil-
ity to capture regular textures. In contrast, introducing the
transformed Gramain improves the model’s ability to capture
irregular and stochastic textures. The distance function for this
method is
Dcs(I , ˆ
I) = Dcc +βspecDspec (13)
where Dcc refers to the distance function introduced in §II-D
and Dspec in §II-E. Textures synthesized using the described
method are show in Figure 4.
V. EX PE RI ME NTATION
In this section we define how each model is calibrated fol-
lowed by an investigation on how the model in §III is affected
when varying its window and step sizes. The aforementioned
model is then compared to the model mentioned in §II-E. The
results obtained using the hybrid model proposed in §IV are
then compared to those obtained with the model in §II-D.
a) Calibration: Each model is calibrated individually to
ensure that each texture model was provided with a good
set of parameters. A total of 20 images consisting of regu-
lar, irregular and stochastic textures were used. The weights
considered during calibration ranged from 101to 106. The
calibration process consisted of synthesizing a texture for
every image-weight pair. The generated textures were then
manually inspected and the best performing weights were
chosen. The weights used in each model are shown in Table
I.
b) Window size: To determine how different window
sizes affect the resulting synthesized texture. We selected six
textures3and varied the window sizes for each generated
texture. The step sizes considered during this process were
half, three quarters and the full length of the window size. The
window sizes considered experimentally were the following:
h×w,1
2h×w,3
4h×3
4w,1
2h×1
2wand 1
4h×1
4w, where hand
wrefer to the height and width of the source image. Samples
of synthesized textures using a subset of these window sizes
are shown in Figure 7.
In our results we observed that the variance in the syn-
thesized textures was reduced when smaller windows sizes
were used. This effect was prominent with a window size of
1
4h×1
4w,64 ×64. This could be attributed to the excessive
reduction in low frequencies (which are partly responsible for
mediating context) of each window, coupled with the reduced
number of dependent variables in each window.
We also observed that when the window and step sizes are
equal, artefacts occur on the borders of each window. This is
expected as the spectral constraints applied on each window
are then independent of one another. Note that the window
sizes used here were employed to demonstrate general effects
one could expect when changing the window size. This is
intended to illustrate the trade-offs that a user would have to
keep in mind when choosing a window size for a new class
of texture. However we did observe that, for this particular
dataset, a window width equal to half of the image width with
a quarter overlap produced convincing results with acceptable
variation.
c) Step size: Similar to the approach employed for dif-
ferent window sizes, we use six textures and vary the step
size for each generation to determine how different step sizes
affect the end result. The same window and step sizes were
used for this process.
A notable observation was that the quality of textures
generated were reduced when a small step size, coupled with a
small window size, with respect to the image dimensions, was
used. This is in part attributable to the overlapping sections
being averaged as shown in equation 12.
d) Windowed Fourier Transform comparison: As men-
tioned in §III, the use of the WFT includes all of the func-
tionality provided by the FT. Because of this we only look at
the situations where the FT constraints were unable to fully
capture a specific texture and compare them with the results
obtained from applying the WFT constraints.
For this comparison we sampled 10 images from each of
the 15 available texture classes4. The window and step size
pairs5considered for this comparison were 192 ×192 : 192,
192 ×96 : 72,96 ×192 : 72,96 ×96 : 72 and 96 ×96 : 48.
A sample of the results are shown in Figure 3. Upon inspec-
tion, the generated textures suggest that the WFT extension is
better able to capture textures which contain some element of
3Two regular, two irregular and two stochastic textures.
4The classes used during experimentation were: Brick, Fabric, Fire, Fractal,
Glass, Ground, Grunge, Leather, Metal, Natural Sky, Stone, Water and Wood.
5Each window and step size pair is separated by a colon(:).
(a) Original (b) Gatys (c) FT (d) WFT (e) Original (f) Gatys (g) FT (h) WFT
(i) Original (j) Gatys (k) FT (l) WFT (m) Original (n) Gatys (o) FT (p) WFT
Fig. 3. Generated textures
merging. However, the last row in Figure 3 shows a texture that
neither extension could convincingly reproduce even though it
contains such an element. The reason for this failure is in part
attributed to the extension’s inability to capture the diagonal
pattern and a large window size. The WFT extension also
performs noticeably better on some near-stochastic textures.
Furthermore both extensions performed well with most of
the irregular and stochastic textures, in our dataset, with no
noticeable increase in texture quality. However, there were
certain cases where Gatys [1] model performed better, one
such case is shown in Figures 3i to 3l .
e) Hybrid extension comparison: To determine whether
the proposed hybrid extension improves on extension §II-D,
we selected 40 different textures and employed each extension
to generate five images for every texture, after which the
best generated images from each extension were compared.
The comparison process consisted of manually inspecting each
image. Note that the selected textures were initially used by
[2] to determine the success of their extension. The window
and step size pairs6considered for this comparison were
192 ×192 : 192,192 ×96 : 72,96 ×192 : 72,96 ×96 : 72
and 96 ×96 : 48. A sample of the results are shown in Figure
4. Additionally Figures 5 and 6 show the variation in quality
for extensions II-D and IV.
The majority of results produced by extension §II-D con-
tained noise. This effect was especially prominent in low
contrast images7. Furthermore the images produced were also
bland compared to the original and those produced by §IV.
This effect is visible in Figures 4g, 4k and 4o. The two
extensions produced results of similar quality in most cases.
However the results of the hybrid extension did indicate an
improvement when synthesizing regular and semi-regular tex-
tures. One expected drawback we observed was the execution
time for the hybrid extension which was on average two
seconds slower per iteration than the original four seconds
6Each window and step size pair is separated by a by a colon(:).
7Note that this was not introduced by our implementation as we used the
authors provided code to generate all the relevant examples.
per iteration.
VI. CONCLUSION
We presented an approach that incorporates spectral con-
straints with long-range consistency. This approach is an
adaptation of the model proposed by [2], with local spec-
tral constraints applied on the Fourier domain. This enables
regions within the texture with different underlying texture
models to be treated as partially independent. We showed that
this could be achieved by use of a Window Fourier Transform
instead of a Fourier Transform, proposed by [3]. This was
followed by a comparison between the proposed adaptation
and [2] where we concluded that our approach produced more
desirable results when synthesizing regular and semi-regular
textures. This is visible in Figures 4d, 4l and 4p
ACKNOWLEDGMENT
This work was funded by the Media lab at Stellenbosch
University.
REFERENCES
[1] L. A. Gatys, A. S. Ecker, and M. Bethge, “Texture Synthesis
Using Convolutional Neural Networks,Nips, pp. 1–10, 2015.
[Online]. Available: http://papers.nips.cc/paper/5633-texture-synthesis-
using-convolutional-neural-networks
[2] G. Berger and R. Memisevic, “Incorporating long-range consistency in
cnn-based texture generation,” CoRR, vol. abs/1606.01286, 2016.
[3] G. Liu, Y. Goussau, and G.-S. Xia, “Texture Synthesis Through
Convolutional Neural Networks and Spectrum Constraints,” pp. 1–6,
may 2016. [Online]. Available: http://arxiv.org/abs/1605.01141
[4] M. A. Nielsen, “Neural networks and deep learn-
ing,” Determination Press, 2015. [Online]. Available:
http://neuralnetworksanddeeplearning.com/
[5] 2016. [Online]. Available:
https://upload.wikimedia.org/wikipedia/commons/6/63/Typical cnn.png
[6] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for
Large-Scale Image Recognition,” Intl. Conf. on Learning Representations
(ICLR), pp. 1–14, 2015.
[7] E. R. Davies, Introduction to Texture Analysis, 2008.
[8] G. Tartavel, Y. Gousseau, and G. Peyr´
e, “Variational Texture Synthesis
with Sparsity and Spectrum Constraints,” Journal of Mathematical Imag-
ing and Vision, vol. 52, no. 1, pp. 124–144, 2015.
(a) Original (b) Gatys (c) CC (d) CC - WFT (e) Original (f) Gatys (g) CC (h) CC - WFT
(i) Original (j) Gatys (k) CC (l) CC - WFT (m) Original (n) Gatys (o) CC (p) CC - WFT
Fig. 4. Generated textures
(a) 96 ×192 : 72 (b) 96 ×192 : 72 (c) 96 ×192 : 72 (d) 96 ×192 : 72 (e) 96 ×96 : 72 (f) 96 ×96 : 72 (g) 96 ×96 : 72 (h) 96 ×96 : 72
Fig. 5. Row one: Hybrid extension with window and step sizes 96 ×192 : 72 and 192 ×96 : 72. Row two: CC extension.
(a) 192 ×96 : 72 (b) 192 ×96 : 72 (c) 192 ×96 : 72 (d) 192 ×96 : 72 (e) 96 ×96 : 72 (f) 96 ×96 : 72 (g) 96 ×96 : 72 (h) 96 ×96 : 72
Fig. 6. Row one: Hybrid extension with window and step sizes 96 ×192 : 72 and 96 ×96 : 72. Row two: CC extension.
(a) Original (b) 1
2h×w(c) 3
4h×3
4w(d) 1
2h×1
2w(e) 1
4h×1
4w(f) 1
4h×1
4w(g) 1
4h×1
4w(h) 1
4h×1
4w
Fig. 7. Generated by hybrid extension with associated window and step sizes. The variables hand wrefers to the image height and width.
... Apart from these work dealing with cross-correlation of features and closely related to the present paper, [25] proposed to incorporate the power spectrum in the loss function, thereby enabling the respect of highly structured textures. In a related work, [32], it is proposed to impose the spectrum constraint by using a windowed Fourier transform, enabling non-stationary behavior to be accounted for, at the cost of the inherent stationary nature of textures. ...
... Nevertheless, the size of the filters used in CNNs such as VGG-19, and therefore the size of the corresponding receptive fields, is small with respect to the size of the image especially when synthesizing high-resolution images (here 1024 × 1024). As we have mentioned in the introduction, several works have addressed this limitation [2,25,28,32,33], but, as we will see in the experimental section, none is fully satisfactory. In the following sections, we propose several improvement of the original neural texture synthesis method in order to address this limitation. ...
Article
Full-text available
The field of texture synthesis has witnessed important progresses over the last years, most notably through the use of convolutional neural networks. However, neural synthesis methods still struggle to reproduce large-scale structures, especially with high-resolution textures. To address this issue, we first introduce a simple multi-resolution framework that efficiently accounts for long-range dependency. Then, we show that additional statistical constraints further improve the reproduction of textures with strong regularity. This can be achieved by constraining both the Gram matrices of a neural network and the power spectrum of the image. Alternatively, one may constrain only the autocorrelation of the features of the network and drop the Gram matrices constraints. In an experimental part, the proposed methods are then extensively tested and compared to alternative approaches, both in an unsupervised way and through a user study. Experiments show the advantage of the multi-scale scheme for high-resolution textures and the advantage of combining it with additional constraints for regular textures.
... Apart from these work dealing with cross-correlation of features and closely related to the present paper, [24] proposed to incorporate the power spectrum in the loss function, thereby enabling the respect of highly structured textures. In a related work, [31], it is proposed to impose the spectrum constraint by using a windowed Fourier Transform, enabling non-stationnary behavior to be accounted for, at the cost of the inherent stationary nature of textures. ...
... Nevertheless, the size of the filters used in CNNs such as VGG-19, and therefore the size of the corresponding receptive fields, are small with respect to the size of the image especially when synthesizing high resolution images (here 1024×1024). As we have mentioned in the introduction, several works have addressed this limitation [24,2,27,31,32], but, as we will see in the experimental section, none is fully satisfactory. In the following sections, we propose several improvement of the original neural texture synthesis method in order to address this limitation. ...
Preprint
Full-text available
The field of texture synthesis has witnessed important progresses over the last years, most notably through the use of Convolutional Neural Networks. However, neural synthesis methods still struggle to reproduce large scale structures, especially with high resolution textures. To address this issue, we first introduce a simple multi-resolution framework that efficiently accounts for long-range dependency. Then, we show that additional statistical constraints further improve the reproduction of textures with strong regularity. This can be achieved by constraining both the Gram matrices of a neural network and the power spectrum of the image. Alternatively one may constrain only the autocorrelation of the features of the network and drop the Gram matrices constraints. In an experimental part, the proposed methods are then extensively tested and compared to alternative approaches, both in an unsupervised way and through a user study. Experiments show the interest of the multi-scale scheme for high resolution textures and the interest of combining it with additional constraints for regular textures.
... This work will be developed later in Section 3.2.3. In a related work, Schreiber et al. [2016] propose to impose the spectrum constraint by using a windowed Fourier Transform instead of a Fourier Transform. This enables non-stationary behaviors to be accounted for, at the cost of the inherent stationary nature of textures. ...
Thesis
In this thesis, we study the transfer of Convolutional Neural Networks (CNN) trained on natural images to related tasks. We follow two axes: texture synthesis and visual recognition in artworks. The first one consists in synthesizing a new image given a reference sample. Most methods are based on enforcing the Gram matrices of ImageNet-trained CNN features. We develop a multi-resolution strategy to take into account large scale structures. This strategy can be coupled with long-range constraints either through a Fourier frequency constraint, or the use of feature maps autocorrelation. This scheme allows excellent high-resolution synthesis especially for regular textures. We compare our methods to alternatives ones with quantitative and perceptual evaluations. In a second axis, we focus on transfer learning of CNN for artistic image classification. CNNs can be used as off-the-shelf feature extractors or fine-tuned. We illustrate the advantage of the last solution. Second, we use feature visualization techniques, CNNs similarity indexes and quantitative metrics to highlight some characteristics of the fine-tuning process. Another possibility is to transfer a CNN trained for object detection. We propose a simple multiple instance method using off-the-shelf deep features and box proposals, for weakly supervised object detection. At training time, only image-level annotations are needed. We experimentally show the interest of our models on six non-photorealistic.
Conference Paper
The huge amount of data resulting from the ac- quisition of medical images with multiple modalities can be overwhelming for storage and sharing through communication systems. Thus, efficient compression algorithms must be in- troduced to reduce the burden of storage and communication resources required by such amount of data. However, since in the medical context all details are important, the adoption of lossless image compression algorithms is paramount. This paper proposes a novel lossless compression scheme tailored to jointly compress the modality of computerized to- mography (CT) and that of positron emission tomography (PET). Different approaches are adopted, namely image-to-image trans- lation techniques and redundancies between both images are also explored. To perform the image-to-image translation approach, we resort to lossless compression of the original CT data and apply a cross-modality image translation generative adversarial network to obtain an estimation of the corresponding PET. Then, the residue that results from the differences between the original PET and its estimation is also compressed. Thus, instead of compressing two independent image modalities, i.e., both images of the original PET-CT pair, in the proposed approach only the CT is independently encoded along with the PET residue. The performed experiments using a publicly available PET- CT pair dataset show that the proposed scheme attains up to 8.9% compression gains for the PET data, in comparison with the naive approach, and up to 3.5% gains for the PET-CT pair.
Article
Online monitoring of pellet size distribution (PSD) of green pellets is an important work in product quality control of pelletization process. Conventionally, image segmentation technique is a preliminary step in computer vision-based PSD monitoring. However, haze, pellets overlapping, and uneven illumination contribute to the main challenges that severely impair the segmentation performance and PSD measurement accuracy. This article proposed a fully automatic online PSD monitoring method incorporating a K-means clustering-based haze judgment module, a lightweight U-net segmentation model with the fusion of none-weight VGG16 features (VGG16-LUnet), and a convex-hull detection and ellipse fitting model for adhesive pellet separation and contour fitting. The VGG16-LUnet model can accurately segment the pellets from both hazy and haze-free images with the help of haze judgment module. Thus, this model can be called VGG16-LUnet-TAdj. Then, a contour fitting model is applied to determine the pellets sizes based on the segmentation results, and the PSD is obtained as well. Extensive experiments on the segmentation of in situ captured green pellet images and the corresponding PSD curves demonstrate that our proposed method performs comparable or even favorable to the state-of-the-art methods.
Article
Full-text available
Here we present a parametric model for dynamic textures. The model is based on spatiotemporal summary statistics computed from the feature representations of a Convolutional Neural Network (CNN) trained on object recognition. We demonstrate how the model can be used to synthesise new samples of dynamic textures and to predict motion in simple movies.
Article
Full-text available
This paper presents a significant improvement for the synthesis of texture images using convolutional neural networks (CNNs), making use of constraints on the Fourier spectrum of the results. More precisely, the texture synthesis is regarded as a constrained optimization problem, with constraints conditioning both the Fourier spectrum and statistical features learned by CNNs. In contrast with existing methods, the presented method inherits from previous CNN approaches the ability to depict local structures and fine scale details, and at the same time yields coherent large scale structures, even in the case of quasi-periodic images. This is done at no extra computational cost. Synthesis experiments on various images show a clear improvement compared to a recent state-of-the art method relying on CNN constraints only.
Chapter
A neural network, specifically known as an artificial neural network (ANN), has been developed by the inventor of one of the first neurocomputers, Dr. Robert Hecht-Nielsen. He defines a neural network as follows:
Article
Gatys et al. (2015) showed that pair-wise products of features in a convolutional network are a very effective representation of image textures. We propose a simple modification to that representation which makes it possible to incorporate long-range structure into image generation, and to render images that satisfy various symmetry constraints. We show how this can greatly improve rendering of regular textures and of images that contain other kinds of symmetric structure. We also present applications to inpainting and season transfer.
Article
This paper introduces a new approach for texture synthesis. We propose a unified framework that both imposes first order statistical constraints on the use of atoms from an adaptive dictionary, as well as second order constraints on pixel values. This is achieved thanks to a variational approach, the minimization of which yields local extrema, each one being a possible texture synthesis. On the one hand, the adaptive dictionary is created using a sparse image representation rationale, and a global constraint is imposed on the maximal number of use of each atom from this dictionary. On the other hand, a constraint on second order pixel statistics is achieved through the power spectrum of images. An advantage of the proposed method is its ability to truly synthesize textures, without verbatim copy of small pieces from the exemplar. In an extensive experimental section, we show that the resulting synthesis achieves state of the art results, both for structured and small scale textures.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.