Conference PaperPDF Available

Abstract and Figures

We present MaeSTrO, a mobile app for image stylization that empowers users to direct, edit and perform a neural style transfer with creative control. The app uses iterative style transfer, multi-style generative and adaptive networks to compute and apply flexible yet comprehensive style models of arbitrary images at run-time. Compared to other mobile applications, MaeSTrO introduces an interactive user interface that empowers users to orchestrate style transfers in a two-stage process for an individual visual expression: first, initial semantic segmentation of a style image can be complemented by on-screen painting to direct sub-styles in a spatially-aware manner. Second, semantic masks can be virtually drawn on top of a content image to adjust neural activations within local image regions, and thus direct the transfer of learned sub-styles. This way, the general feed-forward neural style transfer is evolved towards an interactive tool that is able to consider composition variables and mechanisms of general artwork production, such as color, size and location-based filtering. MaeSTrO additionally enables users to define new styles directly on a device and synthesize high-quality images based on prior segmentations via a servicebased implementation of compute-intensive iterative style transfer techniques.
Content may be subject to copyright.
MaeSTrO: Mobile Style Transfer Orchestration
using Adaptive Neural Networks
Max Reimann
Hasso Plattner Institute for Digital
Engineering, University of Potsdam
Amir Semmo
Hasso Plattner Institute for Digital
Engineering, University of Potsdam
Jürgen Döllner
Hasso Plattner Institute for Digital
Engineering, University of Potsdam
Sebastian Pasewaldt
Digital Masterpieces GmbH, Germany
Mandy Klingbeil
Digital Masterpieces GmbH, Germany
Content / Mask Style / Mask Global Transfer - Iterative Local Control - Iterative
Neural Style Transfer
Local Control - Feed-forward
Figure 1: Comparison of two neural style transfer techniques implemented with MaeSTrO. Compared to the original global
style transfer, the provided tools for local control (color-coded insets) are able to yield more expressive results. Content image
©Matthew Fournier on Unsplash.com, used with permission.
ABSTRACT
We present MaeSTrO, a mobile app for image stylization that em-
powers users to direct, edit and perform a neural style transfer with
creative control. The app uses iterative style transfer, multi-style
generative and adaptive networks to compute and apply exible
yet comprehensive style models of arbitrary images at run-time.
Compared to other mobile applications, MaeSTrO introduces an
interactive user interface that empowers users to orchestrate style
transfers in a two-stage process for an individual visual expression:
rst, initial semantic segmentation of a style image can be com-
plemented by on-screen painting to direct sub-styles in a spatially-
aware manner. Second, semantic masks can be virtually drawn on
top of a content image to adjust neural activations within local
image regions, and thus direct the transfer of learned sub-styles.
This way, the general feed-forward neural style transfer is evolved
towards an interactive tool that is able to consider composition
variables and mechanisms of general artwork production, such as
color, size and location-based ltering. MaeSTrO additionally en-
ables users to dene new styles directly on a device and synthesize
high-quality images based on prior segmentations via a service-
based implementation of compute-intensive iterative style transfer
techniques.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
SIGGRAPH ’18 Appy Hour, August 12-16, 2018, Vancouver, BC, Canada
©2018 Copyright held by the owner/author(s).
ACM ISBN 978-1-4503-5807-1/18/08.
https://doi.org/10.1145/3213779.3213783
CCS CONCEPTS
Computing methodologies Non-photorealistic render-
ing;Image processing;
KEYWORDS
neural style transfer, mobile devices, artistic rendering, interaction
ACM Reference Format:
Max Reimann, Amir Semmo, Jürgen Döllner, Sebastian Pasewaldt, and Mandy
Klingbeil. 2018. MaeSTrO: Mobile Style Transfer Orchestration using Adap-
tive Neural Networks. In Proceedings of SIGGRAPH ’18 Appy Hour. ACM,
New York, NY, USA, 2 pages. https://doi.org/10.1145/3213779.3213783
1 MOTIVATION
Image lters, particularly those used for mobile expressive render-
ing, have become a pervasive technology for casual creativity and
users that seek unique possibilities to stylize images [Dev 2013]. For
instance, mobile artists—a new user group of serious hobbyists with
high standards—are eager to adapt to powerful and exible tools
that facilitate their creative work. Image lters are traditionally
implemented by following an engineering approach; providing low-
and high-level control over the stylization process. With the advent
of neural style transfer technology [Gatys et al
.
2016], mobile image
ltering apps have increasingly evolved into “one-click solutions”
that allow to transfer a pre-dened style image to a content image
(Figure 1). Although this approach enables to easily create artistic
renditions—without having prior knowledge of photo-manipulation
software—the underlying technology faces inherent limitations re-
garding low-level control for localized image stylization [Semmo
et al. 2017a], hindering creative control over the results.
SIGGRAPH ’18 Appy Hour, August 12-16, 2018, Vancouver, BC, Canada M. Reimann et al.
Global Stylization Style Image Masks Content Image Masks Live-Painting Screen
Figure 2: Screenshots of MaeSTrO: Global stylization can be rened by dening content image masks in the live-painting screen.
A color-mapping is used to ease the mapping between style and content image masks. Content image ©Rick Barrett on Un-
splash.com, used with permission.
In this work, we present MaeSTrO, an iOS app that implements
and enhances style transfer technologies to allow for local creative
control that facilitates an interactive, artistic image editing. Our app
targets mobile artists with basic image editing know-how by using
established on-screen painting metaphors for the local denition
of sub-styles and the successive application to content images.
2 TECHNICAL APPROACH
MaeSTrO implements three dierent neural network techniques,
each providing a trade-o between usability and picture quality.
Single-style feed-forward [Johnson et al
.
2016] are currently used
in the majority of techniques for mobile style transfer (e.g., [Semmo
et al
.
2017b]), since they enable nearly interactive performance,
even on mobile devices. Once trained o-line, the feed-forward
network—representing a single style—is globally applied to the
whole input image. To cope with this limitation while maintaining a
short computation time, multi-style generative networks (MSG-Net)
are utilized and extended [Zhang and Dana 2017]. Using semantic
masks for style images, these networks can be trained on multiple
style images and enable local style-blending in feature space, yield-
ing smooth transitions between multiple styles. Although MSG-Nets
improve creative control, users are still limited to apply pre-trained
styles. To enable an on-device style denition, MaeSTrO addition-
ally implements the approach of Huang and Belongie [2017] that
performs a style transfer for arbitrary styles dened on-device
by using an encoder-decoder network containing an adaptive in-
stance normalization (adaIn). Similar to the MSG-Net approach,
we extended the adaIn-network by semantic masks to allow for
local control of style denitions and applications. Also the third
technique, the iterative style transfer approach [Gatys et al
.
2016]
implements local control through segmentation masks [Luan et al
.
2017] and enables the application of arbitrary styles. However, the
computational complexity of the approach does not enable an on-
device application. Thus, it is implemented as a web service, where
users can dene and modify styles on a mobile device, for example
using the adaIn approach, and request the web service to perform
the high-quality style transfer.
All implemented approaches enable local control of the style
application to a content image. In addition, the adaIN and iterative
approaches enable users to dene sub-styles, i.e., locally constrained
regions that are assigned to dierent styles (Figure 2). The denition
and application of sub-styles is implemented using pixel-precise
painting metaphors. When editing a content image, an overlay
provides additional information about which sub-style is mapped
to which virtual brush.
The iterative approach is implemented using PyTorch and the
on-device approaches are implemented using CoreML for the iOS
operating system. The style transfer run-time performance depends
on the number of sub-styles applied as well as of the image resolu-
tion. For example, the application of two sub-styles for an 720
×
720
image takes approx. 1
.
0second for adaIn and 1
.
5seconds for MSG
on an iPad Pro 10.5”. To allow for interactive mask application,
a live painting mode has been implemented that directly shows
the application of pre-computed sub-styles, while the nal image
synthesis is performed afterwards.
ACKNOWLEDGEMENTS
This work was funded by the Federal Ministry of Education and
Research (BMBF), Germany, for the AVA project 01IS15041.
REFERENCES
Kapil Dev. 2013. Mobile Expressive Renderings: The State of the Art. IEEE Computer
Graphics and Applications 33, 3 (May/June 2013), 22–31. https://doi.org/10.1109/
MCG.2013.20
Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. 2016. Image Style Transfer
Using Convolutional Neural Networks. In Proc. CVPR. IEEE Computer Society, Los
Alamitos, 2414–2423. https://doi.org/10.1109/CVPR.2016.265
Xun Huang and Serge Belongie. 2017. Arbitrary Style Transfer in Real-time with
Adaptive Instance Normalization. arXiv.org report 1703.06868. arXiv. https://arxiv.
org/abs/1703.06868
Justin Johnson, Alexandre Alahi, and Li Fei-Fei. 2016. Perceptual Losses for Real-Time
Style Transfer and Super-Resolution. In Proc. ECCV. Springer International, Cham,
Switzerland, 694–711. https://doi.org/10.1007/978-3- 319-46475- 6_43
Fujun Luan, Sylvain Paris, Eli Shechtman, and Kavita Bala. 2017. Deep Photo Style
Transfer. CoRR abs/1703.07511. arXiv. http://arxiv.org/abs/1703.07511
Amir Semmo, Tobias Isenberg, and Jürgen Döllner. 2017a. Neural Style Trans-
fer: A Paradigm Shift for Image-based Artistic Rendering?. In Proc. NPAR, Hol-
ger Winnemöller and Lyn Bartram (Eds.). ACM, New York, 5:1–5:13. https:
//doi.org/10.1145/3092919.3092920
Amir Semmo, Matthias Trapp, Jürgen Döllner, and Mandy Klingbeil. 2017b. Pictory:
Combining Neural Style Transfer and Image Filtering. In Proc. SIGGRAPH Appy
Hour. ACM, New York, NY, USA, 5:1–5:2. https://doi.org/10.1145/3098900.3098906
Hang Zhang and Kristin Dana. 2017. Multi-style Generative Network for Real-time
Transfer. arXiv.org report 1703.06953. arXiv. https://arxiv.org/abs/1703.06953
... The advancement of this concept further enables to interactively design parameterizable image stylization components on-device by reusing building blocks of image processing effects and pipelines [8], which forms a particular requirement for a rapid software product line development of mobile apps [9], service-based image processing and provisioning of processing techniques [34,57,55], as well as novel interaction techniques [50]. While image and video processing techniques are traditionally implemented by following an engineering approach, recent advancements in deep learning and convolutional neural networks showed how image style transfer can be provided in a more generalized way to ease "one-click solutions" for casual creativity apps [40,32,33,30,31]. Although this approach enables to easily create artistic renditions -without having prior knowledge of photo-manipulation software -, however, the underlying technology still has to evolve as an interactive tool that considers the design aspects and mechanisms of artwork production [40]. ...
Thesis
Full-text available
With the improvement of cameras and smartphones, more and more people can now take high-resolution pictures. Especially in the field of advertising and marketing, images with extremely high resolution are needed, e. g., for high quality print results. Also, in other fields such as art or medicine, images with several billion pixels are created. Due to their size, such gigapixel images cannot be processed or displayed similar to conventional images. Processing methods for such images have high performance requirements. Especially for mobile devices, which are even more limited in screen size and memory than computers, processing such images is hardly possible. In this thesis, a service-based approach for processing gigapixel images is presented to approach this problem. Cloud-based processing using different microservices enables a hardware-independent way to process gigapixel images. Therefore, the concept and implementation of such an integration into an existing service-based architecture is presented. To enable the exploration of gigapixel images, the integration of a gigapixel image viewer into a web application is presented. Furthermore, the design and implementation will be evaluated with regard to advantages, limitations, and runtime.
Conference Paper
Full-text available
This work presents Pictory, a mobile app that empowers users to transform photos into artistic renditions by using a combination of neural style transfer with user-controlled state-of-the-art nonlinear image filtering. The combined approach features merits of both artistic rendering paradigms: deep convolutional neural networks can be used to transfer style characteristics at a global scale, while image filtering is able to simulate phenomena of artistic media at a local scale. Thereby, the proposed app implements an interactive two-stage process: first, style presets based on pre-trained feed-forward neural networks are applied using GPU-accelerated compute shaders to obtain initial results. Second, the intermediate output is stylized via oil paint, watercolor, or toon filtering to inject characteristics of traditional painting media such as pigment dispersion (watercolor) as well as soft color blendings (oil paint), and to filter artifacts such as fine-scale noise. Finally, on-screen painting facilitates pixel-precise creative control over the filtering stage, e. g., to vary the brush and color transfer, while joint bilateral upsampling enables outputs at full image resolution suited for printing on real canvas.
Conference Paper
Full-text available
In this meta paper we discuss image-based artistic rendering (IB-AR) based on neural style transfer (NST) and argue, while NST may represent a paradigm shift for IB-AR, that it also has to evolve as an interactive tool that considers the design aspects and mechanisms of artwork production. IB-AR received significant attention in the past decades for visual communication, covering a plethora of techniques to mimic the appeal of artistic media. Example-based rendering represents one the most promising paradigms in IB-AR to (semi-)automatically simulate artistic media with high fidelity, but so far has been limited because it relies on pre-defined image pairs for training or informs only low-level image features for texture transfers. Advancements in deep learning showed to alleviate these limitations by matching content and style statistics via activations of neural network layers, thus making a generalized style transfer practicable. We categorize style transfers within the taxonomy of IB-AR, then propose a semiotic structure to derive a technical research agenda for NSTs with respect to the grand challenges of NPAR. We finally discuss the potentials of NSTs, thereby identifying applications such as casual creativity and art production.
Article
Full-text available
This paper introduces a deep-learning approach to photographic style transfer that handles a large variety of image content while faithfully transferring the reference style. Our approach builds upon recent work on painterly transfer that separates style from the content of an image by considering different layers of a neural network. However, as is, this approach is not suitable for photorealistic style transfer. Even when both the input and reference images are photographs, the output still exhibits distortions reminiscent of a painting. Our contribution is to constrain the transformation from the input to the output to be locally affine in colorspace, and to express this constraint as a custom CNN layer through which we can backpropagate. We show that this approach successfully suppresses distortion and yields satisfying photorealistic style transfers in a broad variety of scenarios, including transfer of the time of day, weather, season, and artistic edits.
Article
Full-text available
Recent work in style transfer learns a feed-forward generative network to approximate the prior optimization-based approaches, resulting in real-time performance. However, these methods require training separate networks for different target styles which greatly limits the scalability. We introduce a Multi-style Generative Network (MSG-Net) with a novel Inspiration Layer, which retains the functionality of optimization-based approaches and has the fast speed of feed-forward networks. The proposed Inspiration Layer explicitly matches the feature statistics with the target styles at run time, which dramatically improves versatility of existing generative network, so that multiple styles can be realized within one network. The proposed MSG-Net matches image styles at multiple scales and puts the computational burden into the training. The learned generator is a compact feed-forward network that runs in real-time after training. Comparing to previous work, the proposed network can achieve fast style transfer with at least comparable quality using a single network. The experimental results have covered (but are not limited to) simultaneous training of twenty different styles in a single network. The complete software system and pre-trained models will be publicly available upon publication.
Conference Paper
We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.
Article
Mobile applications are incorporating underlying platforms' pervasiveness in many innovative ways. Performance barriers due to resource constraints are slowly vanishing, and people are increasingly using mobile devices to perform many daily tasks they previously performed on desktop computers. Although a mobile platform's ability to handle graphics-related tasks requires further investigation, researchers have already made substantial progress. One particular related research area is nonphotorealistic rendering (NPR). NPR involves inherent abstraction, and mobile platforms offer relatively less computing power. So, a convergence of these areas can help deal with producing complex renderings on resource-constrained mobile platforms. This tutorial describes the state of NPR techniques for mobile devices, especially PDAs, tablets, and mobile phones, to motivate the development of efficient mobile NPR apps. In particular, the article addresses NPR advantages, challenges, and solutions. It also discusses mobile NPR visualizations, usability concerns, and future research directions.