Article

Exploring the Neural Algorithm of Artistic Style

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We explore the method of style transfer presented in the article "A Neural Algorithm of Artistic Style" by Leon A. Gatys, Alexander S. Ecker and Matthias Bethge (arXiv:1508.06576). We first demonstrate the power of the suggested style space on a few examples. We then vary different hyper-parameters and program properties that were not discussed in the original paper, among which are the recognition network used, starting point of the gradient descent and different ways to partition style and content layers. We also give a brief comparison of some of the existing algorithm implementations and deep learning frameworks used. To study the style space further we attempt to generate synthetic images by maximizing a single entry in one of the Gram matrices $\mathcal{G}_l$ and some interesting results are observed. Next, we try to mimic the sparsity and intensity distribution of Gram matrices obtained from a real painting and generate more complex textures. Finally, we propose two new style representations built on top of network's features and discuss how one could be used to achieve local and potentially content-aware style transfer.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Recently, convolutional neural network (CNN) [10] based style transfer methods have shown successful applications in transferring the style of a certain type of artistic painting, e.g, Vincent van Gogh's "The Starry Night", to a real world photograph, e.g., an image taken by iPhone. Since the seminal work of Gatys et al. [4], it has attracted a lot of attentions from both academia [7,12,20,21,5,25,28] and industry [23,1,8,3]. Although the work of neural style transfer has shown promising progress on transferring artistic images with rich textures and colors, e.g., the oil paintings, we observe that it is less effective in transferring Chinese traditional painting. ...
... Gatys's work has received lot of attentions and triggered a whole line of research on deep learning based style transfer. [20,21] investigate several variants of Gatys' method for illumination and season transfer. Li et al. [12] utilize the patch-based Markov random field method to represent the style of the image with neural networks. ...
Preprint
Chinese traditional painting is one of the most historical artworks in the world. It is very popular in Eastern and Southeast Asia due to being aesthetically appealing. Compared with western artistic painting, it is usually more visually abstract and textureless. Recently, neural network based style transfer methods have shown promising and appealing results which are mainly focused on western painting. It remains a challenging problem to preserve abstraction in neural style transfer. In this paper, we present a Neural Abstract Style Transfer method for Chinese traditional painting. It learns to preserve abstraction and other style jointly end-to-end via a novel MXDoG-guided filter (Modified version of the eXtended Difference-of-Gaussians) and three fully differentiable loss terms. To the best of our knowledge, there is little work study on neural style transfer of Chinese traditional painting. To promote research on this direction, we collect a new dataset with diverse photo-realistic images and Chinese traditional paintings. In experiments, the proposed method shows more appealing stylized results in transferring the style of Chinese traditional painting than state-of-the-art neural style transfer methods.
... For the modifications of Descriptive Neural Methods Based On Image Iteration, almost all those modifications are based on the algorithm proposed by Gatys et al. in [14]. Novak and Nikulin [34] address the issue that style representation in [14] is invariant to the spatial configuration of the style image and propose a new style representation called "spatial style", which captures less style details and more spatial configurations. Moveover, they also explore the Neural Style Transfer algorithm in [14] by varying different experimental settings (i.e., backends, frameworks, networks, initializations points, content layers and style layers), which were not discussed previously. ...
... Original Neural Style Transfer as well as its slight modifications [14,16] The First Descriptive Neural Style Transfer algorithm proposed by Gatys et al. [34,35] Slight Modifications by varying experimental settings, etc. ...
Preprint
Full-text available
The recent work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNN) in creating artistic fantastic imagery by separating and recombing the image content and style. This process of using CNN to migrate the semantic content of one image to different styles is referred to as Neural Style Transfer. Since then, Neural Style Transfer has become a trending topic both in academic literature and industrial applications. It is receiving increasing attention from computer vision researchers and several methods are proposed to either improve or extend the original neural algorithm proposed by Gatys et al. However, there is no comprehensive survey presenting and summarizing recent Neural Style Transfer literature. This review aims to provide an overview of the current progress towards Neural Style Transfer, as well as discussing its various applications and open problems for future research.
... A CNN is a DL method developed recently that has attracted considerable attention. In general, a CNN is a multilayered network [9][10][11]; a typical CNN is shown schematically in Figure 1. A CNN consists of a series of convolution (C) and subsampling (S) layers. ...
Article
Full-text available
Video style transfer using convolutional neural networks (CNN), a method from the deep learning (DL) field, is described. The CNN model, the style transfer algorithm, and the video transfer process are presented first; then, the feasibility and validity of the proposed CNN-based video transfer method are estimated in a video style transfer experiment on The Eyes of Van Gogh. The experimental results show that the proposed approach not only yields video style transfer but also effectively eliminates flickering and other secondary problems in video style transfer.
... In this paper, the Jin's [12] method is used and we use the discrete Euclidean Ricci flow in order to determine a flat metric that induces zero Gaussian curvature for 3D subject. According to the Ricci flow, the surfaces of 3D subjects with arbitrary topology and genus can be solved, no matter how close or not. ...
... Recently, due to the seminal work of Gatys et al. [2] successfully applied DCNNs to both texture synthesis [3] and style transfer [2,4,27], more and more new algorithms [5,7,8,28,29] based on DCNNs were proposed and obtained visually pleasing results. Compared with traditional methods which only consider low-level features, DCNNs can extract not only low-level features but also high-level features. ...
Article
Full-text available
Recent studies using deep neural networks have shown remarkable success in style transfer, especially for artistic and photo‐realistic images. However, these methods cannot solve more sophisticated problems. The approaches using global statistics fail to capture small, intricate textures and maintain correct texture scales of the artworks, and the others based on local patches are defective on global effect. To address these issues, this study presents a unified model [global and local style network (GLStyleNet)] to achieve exquisite style transfer with higher quality. Specifically, a simple yet effective perceptual loss is proposed to consider the information of global semantic‐level structure, local patch‐level style, and global channel‐level effect at the same time. This could help transfer not just large‐scale, obvious style cues but also subtle, exquisite ones, and dramatically improve the quality of style transfer. Besides, the authors introduce a novel deep pyramid feature fusion module to provide a more flexible style expression and a more efficient transfer process. This could help retain both high‐frequency pixel information and low‐frequency construct information. They demonstrate the effectiveness and superiority of their approach on numerous style transfer tasks, especially the Chinese ancient painting style transfer. Experimental results indicate that their unified approach improves image style transfer quality over previous state‐of‐the‐art methods.
... Markov random field (MRF) [17] , and depth image analogy (DIA) [16]. On the other hand, the model-based iterative method commonly employs reconstruction decoder based on the construction model [8] [9] [12] and image [7], respectively. Mor [11] et al. proposed a method for achieving musical genre migration between different instruments, genres and styles. ...
Conference Paper
Full-text available
The film style refers to the combination and configuration of colors in the film, which is often dominated by one color, making the picture show a certain tendency. However, the creation of special effects not only requires special professional skills, but also takes a lot of manual labor. If artificial intelligence technology can be transferred to the picture style of the film industry, production costs will be greatly reduced. In this paper, we propose a technique which combines the style transfer and meta-learning to create a new way of thinking. Compared with traditional image style transfer, the transfer of film style based on meta-learning could save the cost of film production significantly and take much less time to perform the transfer process. Toward the end, extensive experimental results were presented to validate our proposed method, which clearly outperforms the traditional image style transfer.
... To address the problem that style representation in [8] is invariant to the space configuration of the style image, Novak and Nikulin [23] propose a new style representation called spatial style, which captures less style details and focuses on more spatial configuration. Then, they also propose several helpful modifications to improving the quality of image stylization including activation shift, augmenting the style representation and geometric weighting scheme [18]. ...
Article
Full-text available
Neural style transfer recently has become one of the most popular topics in academic research and industrial application. The existing methods can generate synthetic images by transferring different styles of some images to another given content images, but they mainly focus on learning low-level features of images with losses of content and style, leading to greatly alter the salient information of content images in semantic-level. In this paper, an improved scheme is proposed to keep the salient regions of transferred image the same with that of content image. By adding the region loss calculated from a localization network, the synthetic image can almost keep the main salient regions consistent with that of original content image, which helps for saliency-based tasks such as object localization and classification. And the transferred effect is more natural and attractive, avoiding simple texture overlay of the style image. Furthermore, our scheme can also extend to remain other semantic information (like shape, edge and color) of image with corresponding estimation networks.
... Gram-based Style Transfer: Before Gatys et al. first use deep neural networks [9,10] in both texture synthesis [3] and style transfer [4] (which named neural algorithm of artistic style [2,5]), researchers tried to solve these problems by matching the statistics of content image and style image [11,12]. However, compared with the traditional methods only consider low-level features during the process, deep neural networks can extract not only lowlevel features but also high-level features, this results the new images generated by combining high-level information from content image with multi-level information from style image are usually more impressive. ...
Preprint
Recent studies using deep neural networks have shown remarkable success in style transfer especially for artistic and photo-realistic images. However, the approaches using global feature correlations fail to capture small, intricate textures and maintain correct texture scales of the artworks, and the approaches based on local patches are defective on global effect. In this paper, we present a novel feature pyramid fusion neural network, dubbed GLStyleNet, which sufficiently takes into consideration multi-scale and multi-level pyramid features by best aggregating layers across a VGG network, and performs style transfer hierarchically with multiple losses of different scales. Our proposed method retains high-frequency pixel information and low frequency construct information of images from two aspects: loss function constraint and feature fusion. Our approach is not only flexible to adjust the trade-off between content and style, but also controllable between global and local. Compared to state-of-the-art methods, our method can transfer not just large-scale, obvious style cues but also subtle, exquisite ones, and dramatically improves the quality of style transfer. We demonstrate the effectiveness of our approach on portrait style transfer, artistic style transfer, photo-realistic style transfer and Chinese ancient painting style transfer tasks. Experimental results indicate that our unified approach improves image style transfer quality over previous state-of-the-art methods, while also accelerating the whole process in a certain extent. Our code is available at https://github.com/EndyWon/GLStyleNet.
... Thus local style patterns of the example image can be well preserved. [7] further investigates the DNN-based style transfer by exploring different nets, different initializations and different layers. [8] proposes to control the perceptual factors such as color in order to improve the style transfer quality. ...
Article
Full-text available
Transferring the style of an example image to a content image opens the door of artistic creation for end users. However, it is especially challenging for portrait photos since human vision system is sensitive to the slight artifacts on portraits. Previous methods use facial landmarks to densely align the content face with the style face to reduce the artifacts. However, they can only handle the facial region. As for the whole image, building the dense correspondence is difficult and may easily introduce errors. In this paper, we propose a robust approach for portrait style transfer that gets rid of dense correspondence. Our approach is based on the guided image synthesis framework. We propose three novel guidance maps for the synthesis process. Contrary to former methods, these maps do not require the dense correspondence between content image and style image, which allows our method to handle the whole portrait photo instead of facial region only. In comparison with recent neural style transfer methods, our method achieves more pleasing results and preserves more texture details. Extensive experiments demonstrate our advantage over former methods on portrait style transfer.
... Neural style transfer methods usually employ a pretrained 19layer VGG network to extract features and perform the optimization, because VGG preserves more information at the convolutional layers [21]. In the original work [6], Gatys et al. chose 'conv4_2' as the content layer, and 'conv1_1', 'conv2_1', 'conv3_1', 'conv4_1' and 'conv5_1' as the style layers. ...
Conference Paper
Full-text available
Neural Style Transfer based on Convolutional Neural Networks (CNN) aims to synthesize a new image that retains the high-level structure of a content image, rendered in the low-level texture of a style image. This is achieved by constraining the new image to have high-level CNN features similar to the content image, and lower-level CNN features similar to the style image. However in the traditional optimization objective, low-level features of the content image are absent, and the low-level features of the style image dominate the low-level detail structures of the new image. Hence in the synthesized image, many details of the content image are lost, and a lot of inconsistent and unpleasing artifacts appear. As a remedy, we propose to steer image synthesis with a novel loss function: the Laplacian loss. The Laplacian matrix ("Laplacian" in short), produced by a Laplacian operator, is widely used in computer vision to detect edges and contours. The Laplacian loss measures the difference of the Laplacians, and correspondingly the difference of the detail structures, between the content image and a new image. It is flexible and compatible with the traditional style transfer constraints. By incorporating the Laplacian loss, we obtain a new optimization objective for neural style transfer named Lapstyle. Minimizing this objective will produce a stylized image that better preserves the detail structures of the content image and eliminates the artifacts. Experiments show that Lapstyle produces more appealing stylized images with less artifacts, without compromising their "stylishness".
... Here backpropagation means doing gradient descent to the loss function with respect to the input rather than the weights. However, such process is slow and memory inefficient [13]. ...
Conference Paper
Deep artistic style transfer is popular yet costly as it is computationally expensive to generate artistic images using deep neural networks. We first ignore the network and only try an optimization method to generate artistic pictures, but the variation is limited. Then we speed up the style transfer by deep compression on the CNN layers of VGG. We simply remove inner ReLU functions within each convolutional block, such that each block containing two to three convolutional operation layers with ReLU in between collapses to a fully connected layer followed by a ReLU and a pooling layer. We use activation vectors in the modified network to morph the generated image. Experiments show that using the same loss function of Gatys et al. for style transfer the compressed neural network is competitive to the original VGG but is 2 to 3 times faster. The deep compression on convolutional neural networks shows alternative ways of generating artistic pictures.
... Instead of using a global representation of the style, computed as Gram matrix, they used patches of the neural activation from the style image. Nikulin et al. [7] tried the style transfer algorithm by Gatys et al. on other nets than VGG and proposed several variations in the way the style of the image is represented to archive different goals like illumination or season transfer. However, we are not aware of any work that applies this kind of style transfer to videos. ...
Article
In the past, manually re-drawing an image in a certain artistic style required a professional artist and a long time. Doing this for a video sequence single-handed was beyond imagination. Nowadays computers provide new possibilities. We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence. We make use of recent advances in style transfer in still images and propose new initializations and loss functions applicable to videos. This allows us to generate consistent and stable stylized video sequences, even in cases with large motion and strong occlusion. We show that the proposed method clearly outperforms simpler baselines both qualitatively and quantitatively.
... Li and Wand (2016a) proposed a different way to represent the style with the neural network to improve visual quality where the content and style image show the same semantic content. Nikulin and Novak (2016) tried the style transfer algorithm on features from other networks and proposed several variations in the way the style of the image is represented to achieve different goals like illumination or season transfer. Luan et al. presented an approach for photorealistic style transfer, where both the style image and the content are photographs (Luan et al. 2017). ...
Article
Full-text available
Manually re-drawing an image in a certain artistic style takes a professional artist a long time. Doing this for a video sequence single-handedly is beyond imagination. We present two computational approaches that transfer the style from one image (for example, a painting) to a whole video sequence. In our first approach, we adapt to videos the original image style transfer technique by Gatys et al. based on energy minimization. We introduce new ways of initialization and new loss functions to generate consistent and stable stylized video sequences even in cases with large motion and strong occlusion. Our second approach formulates video stylization as a learning problem. We propose a deep network architecture and training procedures that allow us to stylize arbitrary-length videos in a consistent and stable way, and nearly in real time. We show that the proposed methods clearly outperform simpler baselines both qualitatively and quantitatively. Finally, we propose a way to adapt these approaches also to 360 degree images and videos as they emerge with recent virtual reality hardware.
... Neural style transfer methods usually employ a pretrained 19layer VGG network to extract features and perform the optimization, because VGG preserves more information at the convolutional layers [19]. In the original work [6], Gatys et al. chose 'conv4_2' as the content layer, and 'conv1_1', 'conv2_1', 'conv3_1', 'conv4_1' and 'conv5_1' as the style layers. ...
Article
Neural Style Transfer based on Convolutional Neural Networks (CNN) aims to synthesize a new image that retains the high-level structure of a content image, rendered in the low-level texture of a style image. This is achieved by constraining the new image to have high-level CNN features similar to the content image, and lower-level CNN features similar to the style image. However in the traditional optimization objective, low-level features of the content image are absent, and the low-level features of the style image dominate the low-level detail structures of the new image. Hence in the synthesized image, many details of the content image are lost, and a lot of inconsistent and unpleasing artifacts appear. As a remedy, we propose to steer image synthesis with a novel loss function: the Laplacian loss. The Laplacian matrix ("Laplacian" in short), produced by a Laplacian operator, is widely used in computer vision to detect edges and contours. The Laplacian loss measures the difference of the Laplacians, and correspondingly the difference of the detail structures, between the content image and a new image. It is flexible and compatible with the traditional style transfer constraints. By incorporating the Laplacian loss, we obtain a new optimization objective for neural style transfer named Lapstyle. Minimizing this objective will produce a stylized image that better preserves the detail structures of the content image and eliminates artifacts. Experiments show that Lapstyle produces more appealing stylized images with less artifacts, without compromising their "stylishness".
... Instead of using a global representation of the style, computed as Gram matrix, they used patches of the neural activation from the style image. Nikulin et al. [7] tried the style transfer algorithm by Gatys et al. on other nets than VGG and proposed several variations in the way the style of the image is represented to archive different goals like illumination or season transfer. However, we are not aware of any work that applies this kind of style transfer to videos. ...
Conference Paper
In the past, manually re-drawing an image in a certain artistic style required a professional artist and a long time. Doing this for a video sequence single-handed was beyond imagination. Nowadays computers provide new possibilities. We present an approach that transfers the style from one image (for example, a painting) to a whole video sequence. We make use of recent advances in style transfer in still images and propose new initializations and loss functions applicable to videos. This allows us to generate consistent and stable stylized video sequences, even in cases with large motion and strong occlusion. We show that the proposed method clearly outperforms simpler baselines both qualitatively and quantitatively.
... Gatys et al. [8,12] define a squared loss on the correlations between feature maps of some layers and synthesize natural textures of high perceptual quality using the pretrained CNN called VGG [3]. Gatys et al. [13] then combine the loss on the correlations as a proxy to the style of a painting and the loss on the activations to represent the content of an image, and successfully create artistic images by converting the artistic style to the content image, inspiring several followups [14,15]. Another stream of visualization aims to understand what each neuron has learned in a pretrained network and synthesize an image that maximally activates individual features [5,9] or the class prediction scores [6]. ...
Conference Paper
Full-text available
To what extent is the success of deep visualization due to the training? Could we do deep visualization using untrained, random weight networks? To address this issue, we explore new and powerful generative models for three popular deep visualization tasks using untrained, random weight convolutional neural networks. First we invert representations in feature spaces and reconstruct images from white noise inputs. The reconstruction quality is statistically higher than that of the same method applied on well trained networks with the same architecture. Next we synthesize textures using scaled correlations of representations in multiple layers and our results are almost indistinguishable with the original natural texture and the synthesized textures based on the trained network. Third, by recasting the content of an image in the style of various artworks, we create artistic images with high perceptual quality, highly competitive to the prior work of Gatys et al. on pretrained networks. To our knowledge this is the first demonstration of image representations using untrained deep neural networks. Our work provides a new and fascinating tool to study the representation of deep network architecture and sheds light on new understandings on deep visualization.
... In [3], the same authors show that by adding a second term to the cost, which matches the content of another image, one can render that other image in the "style" (texture) of the first. Numerous follow-up works have since then analysed and extended this approach [4,5,6,7,8,9]. ...
Article
Gatys et al. (2015) showed that pair-wise products of features in a convolutional network are a very effective representation of image textures. We propose a simple modification to that representation which makes it possible to incorporate long-range structure into image generation, and to render images that satisfy various symmetry constraints. We show how this can greatly improve rendering of regular textures and of images that contain other kinds of symmetric structure. We also present applications to inpainting and season transfer.
Article
Full-text available
Image style transfer (IST) is a hot topic in the computer vision community, which refers to learning the distribution of a given style image to convert any image into corresponding image style while the content of the original image is preserved as much as possible. Early style transfer mainly utilizes texture features. Thanks to the great improvement of deep learning technology, researches on IST based on convolutional neural networks (CNN) have achieved breakthroughs in accuracy and speed. Focusing on the topic of deep learning-based IST, we will introduce the latest algorithms in detail, including their basic ideas, key steps, advantages, and disadvantages. Also, we will give an analysis of the performance of representative methods. Furthermore, we discuss the problems to be solved in style transfer and summarize the challenges and development trends in the future.
Chapter
Creativity is a defining feature of human cognition and has fascinated philosophers and scientists throughout history. In the last few decades, the development of rigorous experimental techniques, advances in neuroscience, and the explosive growth of computational methods has led to great advances in the understanding of the creative process. This chapter provides an overview of some of this work. It looks at recent results from studies of the neurological processes underlying creative thinking, and at computational models that attempt to simulate creativity at a phenomenological level. These models span a range of levels, from neurodynamical models attempting to simulate mental processes to more abstract ones. The chapter also looks explicitly at models of collective creativity from small groups to large social networks. Finally, it points to some recent developments in machine learning that are relevant to computational creativity and are influencing the modes of human creativity.
Article
Full-text available
Deep learning method is widely used in computer vision tasks with large scale annotated datasets. However, it is a big challenge to obtain such datasets in most directions of the vision based non-destructive testing (NDT) field. Data augmentation is proved as an efficient way in dealing with the lack of large-scale annotated datasets. In this paper, we propose CycleGAN-based extra-supervised (CycleGAN-ES) to generate synthetic NDT images, where the ES is used to ensure that the bidirectional mapping are learned for corresponding label and defect. Furthermore, we show the effectiveness of using the synthesized images to train deep convolutional neural networks (DCNN) for defects recognition. In the experiments, we extract numbers of X-ray welding images with both defect and no-defect from the published GDXray dataset, CycleGAN-ES are used to generate the synthetic defect images based on a small number of extracted defect images and manually drawn labels which are used as a content guide. For quality verification of the synthesized defect images, we use a high-performance classifier pre-trained using big dataset to recognize the synthetic defects and show comparability of the performances of classifiers trained using synthetic defects and real defects respectively. To present the effectiveness of using the synthesized defects as an augmentation method, we train and evaluate the performances of DCNN for defects recognition with or without the synthesized defects.
Chapter
Chinese traditional painting is one of the most historical artworks in the world. It is very popular in Eastern and Southeast Asia due to being aesthetically appealing. Compared with western artistic painting, it is usually more visually abstract and textureless. Recently, neural network based style transfer methods have shown promising and appealing results which are mainly focused on western painting. It remains a challenging problem to preserve abstraction in neural style transfer. In this paper, we present a Neural Abstract Style Transfer method for Chinese traditional painting. It learns to preserve abstraction and other style jointly end-to-end via a novel MXDoG-guided filter (Modified version of the eXtended Difference-of-Gaussians) and three fully differentiable loss terms. To the best of our knowledge, there is little work study on neural style transfer of Chinese traditional painting. To promote research on this direction, we collect a new dataset with diverse photo-realistic images and Chinese traditional paintings (The dataset will be released at https://github.com/lbsswu/Chinese_style_transfer.). In experiments, the proposed method shows more appealing stylized results in transferring the style of Chinese traditional painting than state-of-the-art neural style transfer methods. KeywordsNeural style transferChinese traditional painting
Article
Full-text available
Convolutional neural networks (CNNs) have proven highly effective at image synthesis and style transfer. For most users, however, using them as tools can be a challenging task due to their unpredictable behavior that goes against common intuitions. This paper introduces a novel concept to augment such generative architectures with semantic annotations, either by manually authoring pixel labels or using existing solutions for semantic segmentation. The result is a content-aware generative algorithm that offers meaningful control over the outcome. Thus, we increase the quality of images generated by avoiding common glitches, make the results look significantly more plausible, and extend the functional range of these algorithms---whether for portraits or landscapes, etc. Applications include semantic style transfer and turning doodles with few colors into masterful paintings!
Article
Full-text available
In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.
Article
Full-text available
Recent years have produced great advances in training large, deep neural networks (DNNs), including notable successes in training convolutional neural networks (convnets) to recognize natural images. However, our understanding of how these models work, especially what computations they perform at intermediate layers, has lagged behind. Progress in the field will be further accelerated by the development of better tools for visualizing and interpreting neural nets. We introduce two such tools here. The first is a tool that visualizes the activations produced on each layer of a trained convnet as it processes an image or video (e.g. a live webcam stream). We have found that looking at live activations that change in response to user input helps build valuable intuitions about how convnets work. The second tool enables visualizing features at each layer of a DNN via regularized optimization in image space. Because previous versions of this idea produced less recognizable images, here we introduce several new regularization methods that combine to produce qualitatively clearer, more interpretable visualizations. Both tools are open source and work on a pre-trained convnet with minimal setup.
Article
Full-text available
We propose a deep convolutional neural network architecture codenamed "Inception", which was responsible for setting the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC 2014). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC 2014 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.
Article
Full-text available
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.
Conference Paper
Torch7 is a versatile numeric computing framework and machine learning library that extends Lua. Its goal is to provide a flexible environment to design and train learning machines. Flexibility is obtained via Lua, an extremely lightweight scripting language. High performance is obtained via efficient OpenMP/SSE and CUDA implementations of low-level numeric routines. Torch7 can easily be in- terfaced to third-party software thanks to Lua's light interface.
Conference Paper
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif- ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implemen- tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called dropout that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry
Article
We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 dif-ferent classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make train-ing faster, we used non-saturating neurons and a very efficient GPU implemen-tation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.
Article
In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively.
Article
Torch7 is a versatile numeric computing framework and machine learning library that extends Lua. Its goal is to provide a flexible environment to design and train learning machines. Flexibility is obtained via Lua, an extremely lightweight scripting language. High performance is obtained via efficient OpenMP/SSE and CUDA implementations of low-level numeric routines. Torch7 can easily be in-terfaced to third-party software thanks to Lua's light interface.
Limited-memory bfgs Available: https : / / en
  • W Commons
W. Commons. (2015). Limited-memory bfgs, [Online]. Available: https : / / en. wikipedia. org/wiki/Limited-memory_BFGS.
Starry night, [Online] Available: https://en
  • V V Gogh
V. V. Gogh. (1889). Starry night, [Online]. Available: https://en.wikipedia.org/wiki/The_ Starry_Night.
Hatosia, [Online] Available: https : / / www . fractalus . com / dan
  • D Kuzmenka
D. Kuzmenka. (2008). Hatosia, [Online]. Available: https : / / www. fractalus. com / dan / Galleries / Fractals / Fractal % 20Gallery%208/slides/Hatosia.html.
Cunn, [Online] Available: https : //github
  • Torch
Torch. (2015). Cunn, [Online]. Available: https : //github.com/torch/cunn.
Neural-style, [Online] Available: https : / / github
  • J Johnson
J. Johnson. (2015). Neural-style, [Online]. Available: https : / / github. com / jcjohnson / neural-style.
The special grooming needs of a senior cat
  • Petfinder
PetFinder. (2015). The special grooming needs of a senior cat, [Online]. Available: https : / / www. petfinder. com / cats / cat -grooming / grooming-needs-senior-cat/.
Pessimismo e optimismo
  • G Balla
G. Balla. (1923). Pessimismo e optimismo, [Online]. Available: http : / / www. wikiart. org / en / giacomo -balla / pessimism -andoptimism-1923.
Cubist 9, [Online] Available: http : / / www . ebsqart . com / Art - Galleries / Contemporary -Cubism
  • T C Fedro
T. C. Fedro. (1969). Cubist 9, [Online]. Available: http : / / www. ebsqart. com / Art - Galleries / Contemporary -Cubism / 43 / Cubist-9/204218/.
Despair, [Online] Available: http://alexgrey
  • A Grey
A. Grey. (1996). Despair, [Online]. Available: http://alexgrey.com/art/paintings/ soul/despair/.
Square vp jared fliesler joins matrix partners as a general partner
  • Pando
Pando. (2013). Square vp jared fliesler joins matrix partners as a general partner, [Online]. Available: https : / / pando. com / 2013 / 03 / 07 / square -vp -jared -fliesler -joinsmatrix -partners -as -a -generalpartner/.
Caffe: convolutional architecture for fast feature embedding
  • Y Jia
  • E Shelhamer
  • J Donahue
  • S Karayev
  • J Long
  • R Girshick
  • S Guadarrama
  • T Darrell
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, "Caffe: convolutional architecture for fast feature embedding," arXiv preprint arXiv:1408.5093, 2014. [Online]. Available: http : / / caffe. berkeleyvision.org/.