Content uploaded by Andreas Lintermann
Author content
All content in this area was uploaded by Andreas Lintermann on May 30, 2019
Content may be subject to copyright.
Enabling Interactive Supercomputing
at JSC Lessons Learned
Jens Henrik G¨obbert1(B
),TimKreuzer
1, Alice Grosch1,
Andreas Lintermann2,3, and Morris Riedel1
1J¨ulich Supercomputing Centre, Forschungszentrum J¨ulich GmbH, J¨ulich, Germany
{j.goebbert,t.kreuzer,a.grosch,m.riedel}@fz-juelich.de
2Institute of Aerodynamics and Chair of Fluid Mechanics,
RWTH Aachen Unive r s i t y , A a chen, Germany
a.lintermann@aia.rwth-aachen.de
3J¨ulich Aachen Research Alliance (JARA) - High Performance Computing,
RWTH Aachen Unive r s i t y , A a chen, Germany
http://www.fz-juelich.de/jsc, http://www.aia.rwth-aachen.de,
https://www.jara.org
Abstract. Research and analysis of large amounts of data from scientific
simulations, in-situ visualization, and application control are convinc-
ing scenarios for interactive supercomputing. The open-source software
Jupyter (or JupyterLab) is a tool that has already been used success-
fully in many scientific disciplines. With its open and flexible web-based
design, Jupyter is ideal for combining a wide variety of workflows and
programming methods in a single interface. The multi-user capability
of Jupyter via JuypterHub excels it for scientific applications at super-
computing centers. It combines the workspace that is local to the user
and the corresponding workspace on the HPC systems. In order to meet
the requirements for more interactivity in supercomputing and to open
up new possibilities in HPC, a simple and direct web access for starting
and connecting to login or compute nodes with Jupyter or JupyterLab
at J¨ulich Supercomputing Centre (JSC) is presented. To corroborate the
flexibility of the new method, the motivation, applications, details and
challenges of enabling interactive supercomputing, as well as goals and
prospective future work will be discussed.
Keywords: Interactive supercomputing
High Performance Computing ·Jupyter
1Introduction
Extracting new scientific knowledge from large amounts of data is of crucial
importance for science. However, scientific progress will only be achieved if
the data obtained can be processed into meaningful results. But nowadays,
researchers are increasingly confronted with an explosion in the volume of data.
c
⃝Springer Nature Switzerland AG 2018
R. Yokota et al. (Eds.): ISC 2018 Workshops, LNCS 11203, pp. 669–677, 2018.
https://doi.org/10.1007/978-3-030-02465-9_48
670 J. H. G¨obbert et al.
With the rapidly increasing computing power of supercomputers, huge amounts
of data are generated in a short time that by far exceed the capacity typically
available to scientists on their local computers. Moving such data from super-
computing centers to local hardware is impractical if not impossible. As the
size of simulations increases, post-processing becomes a bottleneck on the way
to the desired scientific findings. Data analysis and visualization functions have
therefore been shifted to the supercomputing centers.
High Performance Computing (HPC) systems are usually used by a large
number of users simultaneously. The computer resources are accessed using
asynchronous batch scheduling. However, scientific knowledge often only arises
through interactive and iterative data analysis by “human in the loop” processes.
These different application modes of supercomputers seem to be contrary to each
other. Bridging those modes via interactive HPC with the method presented in
this work closes the gap between explorative data analysis and pure HPC. It
hence leads to an efficient workflow that can easily be integrated between data
production and data analysis.
2BackgroundJupyter
Jupyter1is a browser-based interactive computing environment that allows to
combine code execution, math, plots and rich text and media into single docu-
ments called Jupyter Notebooks. Jupyter is the result of the IPython project2,
which is an extended interactive Python shell with additional functions. Ini-
tially IPython has been developed as a pure terminal application. It has been
extended by graphical user interfaces and finally by a Web Application Frame-
work. The concept of a human-readable notebook document is developed as a
uniform interface.
Notebooks are intended for the creation of reproducible computer narra-
tives. It combines analysis descriptions, executable source code and results, and
is based on a series of open standards for interactive computing. These open
standards can be used to develop specific applications for embedded interactive
computing. Using these notebook documents, entire workflows and reproducible
findings can easily be shared among researchers. At the same time, the note-
books can also be converted into other formats such as HTML or LaTeX. To
run executable scripts in a notebook, it can be connected to one or more of
Jupyter compute kernel instances. These kernels exist for different programming
languages, i.e., Jupyter is by no means limited to Python.
With JupyterLab3, the Web Application Framework has recently been
revised and extended significantly. The developers of JupyterLab aim at offering
a web interface with a high degree of integration between notebooks, documents,
1https://jupyter.org.
2https://ipython.org.
3https://jupyterlab.readthedocs.io.
Enabling Interactive Supercomputing at JSC Lessons Learned 671
and activities. It now represents an advanced interactive development environ-
ment for working with notebooks, code and data, and offers full support for
Jupyter notebooks. JupyterLab also provides the user with text editors, termi-
nals, data file viewers, and other specific components side by side with Jupyter
notebooks in a tabular workspace.
On local machines, the user usually starts Jupyter from the command line
and uses a web browser on the same system to access the software interface.
However, if Jupyter is started on a multi-user system, it requires in the best
case an intermediate layer, which enables shared usage of a computer system via
Jupyter. This task can be performed by JupyterHub4,whichisawebapplication
that enables a multi-user hub for spawning, managing, and proxying multiple
instances of single-user Jupyter notebook servers.
At JSC a JupyterHub is operated in a virtual machine as a gateway to
the compute clusters. Users authenticate via JupyterHub in a central identity
management with their JSC web access credentials.
3JupyterIntegrationatJSC
Starting Jupyter on the HPC systems of JSC solely via a web frontend requires
three basic steps. In the first step, the user must authenticate successfully. Sub-
sequently, the system must determine how and where the Jupyter Notebook
Server should be started. In the final step, the defined job must be executed on
the HPC system via the user’s account and the new Jupyter Notebook Server
must be connected to the users web browser through the hub. These three steps
of login, configuration, and startup are outlined in Fig. 1and are described in
detail for JSC’s configuration in the following.
First, the user visits the Jupyter@JSC website5and clicks on the login button
(Fig. 1- step 1). The user is redirected to the Unity IdM6-Identitymanagement
and authentication platform, asking for the Web LDAP user credentials and
checking them by Web LDAP (Fig. 1-step2).Uponsuccessfulauthentication,
Unity IdM asks the LDAP for additional user information. This information is
compared to the local Unity IdM database. If a suitable account is present, the
user is logged from Unity IdM’s point of view and a cookie with an OAuth-state
is created in the browser. If no suitable account is found, the user is offered
to create one provided that the terms of use, data protection declaration, and
declaration of consent are accepted. The user is redirected back to JupyterHub
with the appropriate authorization code for the OAuth-state (Fig. 1- step 3) and
logged in. JupyterHub subsequently queries the missing user information from
Unity IdM (Fig. 1-step4).
Various compute clusters are available at JSC. These clusters are usually
divided into login and compute nodes and in different partitions with different
access restrictions. In addition, a user can have multiple user accounts to access
4https://jupyterhub.readthedocs.io.
5https://jupyter-jsc.fz- juelich.de.
6https://www.unity-idm.eu.
672 J. H. G¨obbert et al.
Fig. 1. Illustration of the interaction of JupyterHub, Unity IdM, UNICORE/X, and
the compute cluster when starting a new Jupyter Notebook Server via the Jupyter
Portal at JSC. Steps 1–4 show the login process, steps 5–7 the configuration process
and steps 8–11 the starting process.
HPC computing resources. Therefore, the user chooses the appropriated user
account and computing resources after successful login. This step is performed
in the server configuration, which follows a successful login.
For this purpose, the service UNICORE/X7[1,6]withthevalidOAuth-token
is first asked which user accounts exist for the user (Fig. 1-step5).UNICORE/X
asks Unity-IdM if this OAuth-token is known and requests parts of the user infor-
mation of the Unity IdM account (Fig.1-step6).Thisinformationiscompared
to the HPC-LDAP data and the corresponding user accounts on the HPC sys-
tems are determined. The response is forwarded to JupyterHub and presented
to the user for selection (Fig. 1-step7).
In the name of the selected user account, UNICORE/X starts a Jupyter
Notebook Server via its service UNICORE/TSM running on the cluster as soon
as the user initiates the start of Jupyter (Fig. 1-steps8,9).Finally,theURL
of the Jupyter Notebook Server is returned to JupyterHub (Fig. 1-step10)and
from there the website is transmitted to the user’s browser (Fig.1-step11).
4UseCase:Rhinodiagnost
Rhinodiagnost8is a project in which partners from industry and research pre-
pare personalized medicine in rhinology in order to offer practicing physicians
new extended possibilities for functional diagnostics [3]. That is, JSC cooper-
ates with the Institute of Aerodynamics and Chair of Fluid Mechanics (AIA),
RWTH A a che n U n i v e r s i t y, within t h e J ¨ulich Aachen Research Alliance - High
Performance Computing (JARA-HPC), and Sutter Medizintechnik GmbH, Med
Contact GmbH, and Angewandte Informationstechnikgesellschaft mbH in this
7https://www.unicore.eu/.
8https://www.rhinodiagnost.eu/.
Enabling Interactive Supercomputing at JSC Lessons Learned 673
project. Rhinodiagnost will increase the surgical success rate by validating treat-
ment therapies a priori to medical interventions. One of the goals of this project
is to use simulations running on HPC systems to support surgeons in finding
optimal procedures for the individual patient. This requires high-resolution flow
simulations based on patient-specific anatomically correct geometries. In more
detail, highly resolved lattice-Boltzmann simulations are preformed with the
simulation framework Zonal Flow Solver [4,5](ZFS),developedatAIA.These
simulations necessitate computing resources that can only be provided by HPC
centers such as JSC (Fig. 2).
Fig. 2. Exemplary visualization of particle dynamics in the upper human respiratory
tract. The calculations employed the simulation framework ZFS developed by the Rhin-
odiagnost project partner AIA, RWTH Aachen University.
Simulations are, however, only a part of the whole processing pipeling that is
necessary to allow for interactive supercomputing in medical context. Therefore,
they need to be embedded in existing treatment processes to become a new
way to support physicians in developing patient-specific treatment strategies
through HPC results. Obviously, it is important to adapt HPC and the numerical
methods to the requirements of the physician and not vice-versa. The amount of
data generated by high-resolution simulations can hardly leave the data center
due to its immense size and must therefore be evaluated on site. Furthermore,
the physician should be allowed to modify the geometry for a virtual operation
at simulation runtime (“in-situ computational steering”) or for a subsequent
simulation.
The main task of Forschungszentrum J¨ulich is to develop software compo-
nents that provide physicians with interactive and targeted access to HPC and
the analysis of simulation data on HPC systems. The software must offer an
extensible interface from which simulations can be managed and flexibly eval-
uated. It is essential to make novel developments of visualization and analysis
674 J. H. G¨obbert et al.
methods accessible to industry as fast as possible. Reimplementation of working
solutions in another software environment must therefore be largely avoided. In
JSC’s view, Jupyter is a key component in achieving this goal.
5 Use Case: Deep Learning
The turbulent motion of fluid flows is a complex, highly non-linear, multi-scale
phenomenon that raises some of the most difficult and fundamental problems of
classical physics. Turbulent flows are characterized by random spatial-temporal
fluctuations of their velocity and scalar fields. The general challenge of turbulence
research is to predict the statistics of such fluctuations. An accurate prediction
is of practical importance for a wide range of applications in engineering and
natural sciences. A novel approach to this is the use of Deep Learning (DL) for
this research.
In recent years, DL has improved considerably and has proven useful in a wide
range of sciences, from computer science to life sciences. Despite its stochastic
nature, turbulence has certain coherent structures and statistical symmetries
that can be tracked over time by DL techniques, i.e., using DL is a promising
approach for predicting the smallest statistics of turbulence.
On the one hand, research in this field requires the classical HPC approach
in order to calculate sufficiently meaningful and high-resolution simulations with
many time steps as a data basis for DL algorithms. On the other hand, the devel-
opment of DL networks is a highly interactive work. Interactive supercomputing
successfully combines these two requirements.
With the software psOpen [2], direct numerical simulations were performed
on the HPC systems at JSC to generate the necessary reference data for the
learning and test phase of the DL network with high accuracy. The DL strategy
itself is based on Wasserstein Generative Adversarial Networks (wGANs) with
the software Keras9/TensorFlow10,whichareverysuitableforsmall-scaletur-
bulences due to their stability and interpretability of the learning curve. In the
first step, however, the development of universal turbulent models based on DL
means defining and testing a wide variety of networks. To the best of the authors
knowledge, a fully automatic method for this is not known. DL is therefore a
classic case of “human-in-the-loop” processes in HPC.
The possibility of obtaining a large additional gain for HPC experts via the
Jupyter software without discernible restrictions has proven itself. Above all,
because the software landscape in DL can largely be controlled and intercon-
nected via Python. The development of a sustainable network for a specific
application case is always an iterative process in which various configurations
have to be tested.
9https://keras.io.
10 https://www.tensorflow.org.
Enabling Interactive Supercomputing at JSC Lessons Learned 675
6LessonsLearned
Solutions have been found and lessons have been learned while establishing an
interactive web access via Jupyter to the HPC systems at JSC. This includes
first and foremost the secure authentication of the user via a separate identity
management system to which the user of JupyterHub is forwarded. It is obvi-
ous, as for other web services of JSC, to use the already present and reliable
open-source software Unity IdM. The combination with JupyterHub proved sur-
prisingly simple using the well-known OAuth2 protocol.
The Unity IdM installation maintains its own user database for the Jupyter
web service at JSC and uses the central web LDAP for web services at JSC
to check user credentials. A direct comparison with the HPC-LDAP is techni-
cally possible, but not practical for the following reasons. Separating the web
service accounts from the HPC accounts allows to better combine the different
web services at JSC. Furthermore, computer systems can be supported that are
not accessible via the central HPC-LDAP and which implement web-compatible
authentication methods more easily. This solution lets the user decide to acti-
vate the web service fo r interact ive HPC and to accesses the corre sponding HPC
account via a web frontend. Independent of that, access to other web services
at JSC is maintained even if no HPC account exists. A 1:1 account mapping of
Web-LDAP and HPC-LDAP (currently, status 07/2018) would not be possible
anyway, since multiple HPC accounts for different HPC systems can belong to
asingleuser.Therefore,nodirectassignmentispossible.
The new possibility to access the HPC systems via the web service
Jupyter@JSC requires new solutions not only on the technical but also on the
legal side. Consent, Terms of Use, and Privacy Policy must be developed and pre-
sented to the user during the registration process so that they can be accepted.
Furthermore, it is required that a user can unregister from the Jupyter@JSC web
service just as simple as register. The separate holding of user data in the Unity
DB and checking the credentials using the Web LDAP allow a simple deletion
of user-specific data in the Unity DB.
The integration of 3D-visualization methods in Jupyter notebooks is promis-
ing, however not yet suitable for more complex requirements and large-scale
simulation data. A complete replacement for full software packages such as Par-
aView11 or VisIt12 is still a long way off. JSC therefore supports the porting of
visualization functionalities in Jupyter plugins. In the meantime, better integra-
tion of remote desktop solutions could close the gap.
A diverse HPC software landscape is provided at JSC by means of a software
module concept. Different software packages can be installed on the system with-
out introducing dependency conflicts. However, this also allows a large number
of possible combinations of software packages, which cannot all be loaded simul-
taneously under Jupyter. On the basis of the Jupyter kernel the user can load
special software combinations. At present, this requires a deeper understanding
11 https://www.paraview.org.
12 https://wci.llnl.gov/simulation/computer-codes/visit.
676 J. H. G¨obbert et al.
of the Jupyter kernel. Since for non-experts this is a big hurdle, the software
available via web frontend is currently limited.
Independent of the great possibilities to make HPC interactively accessible,
the implementation phase showed new functionalities developed at JSC to be eas-
ily deployable to the user communities via the web-based Jupyter/JupyterLab.
Solutions for individual users can quickly be distributed or referred to by a
large user group. Instead of describing solutions only on web pages, they can be
executed directly and thus integrated more quickly into existing workflows. In
particular, the constant change of the user base on HPC systems in the scien-
tific field requires that workflows can be passed on completely, correctly, and at
the same time in an easy way to new researchers. That’s why interactive HPC
brings advantages for the entire HPC community far beyond the direct area of
application.
7Outlook
The first steps for interactive supercomputing at JSC have successfully been
taken using the web service Jupyter. The implementations cover only a small
part of the possibilities of interactive supercomputing, but from JSC’s point of
view the most important ones are already provided. In close cooperation with
the users of the HPC systems the implementations are refined. The project
is mainly driven by interactive visualization and collaborative work in HPC
environments. In principle, JSC considers further web services accessible from
Jupyter important. Here the integration of web based remote desktops are first
possible candidates.
8Conclusion
It was shown how interactive HPC is deployed to user of the HPC systems at JSC
by means of Jupyter via JupyterHub. This requires a go od integration into the
existing services of JSC, especially the multi-layer authentication from a web
service account to the HPC-LDAP accounts, not ignoring Consent, Terms of
Use, and Privacy Policy. The integration of 3D-visualization methods in Jupyter
notebooks is promising, however not yet suitable for more complex requirements
and large-scale simulation data. How to support the almost infinite large number
of combinations of software packages, especially for non-expert users, needs to be
worked on. Containers are definitely a possible solution here. In general, on the
basis of two use cases it was shown that interactive HPC is not only useful but
also necessary to advanced research in new scientific disciplines and industry.
Acknowledgement. This work is supported by the Rhinodiagnost project funded
by the Zentrale Innovationsprogramm Mittelstand (ZIM) of the Federal Ministry of
Economical Affairs and Energy (BMWi) and the InHPC-DE project as part of the
SiVeGCS project to promote closer technical integration of the three GCS HPC centers
in Stuttgart (HLRS), J¨ulich (JSC) and Munich (LRZ).
Enabling Interactive Supercomputing at JSC Lessons Learned 677
References
1. Benedyczak, K., Schuller, B., Petrova-ElSayed, M., Rybicki, J., Grunzke, R.: UNI-
CORE 7 - middleware services for distributed and federated computing. In: 2016
International Conference on High Performance Computing and Simulation, Inns-
bruck, Austria, 18 July 2016–22 July 2016, pp. 613–620. IEEE (2016). https://doi.
org/10.1109/HPCSim.2016.7568392.http://juser.fz- juelich.de/record/820611
2. Goebbert, J.H., Gauding, M., Ansorge, C., Hentschel, B., Kuhlen, T., Pitsch, H.:
Direct numerical simulation of fluid turbulence at extreme scale with psOpen.
Adv. Parallel Comput. 27, 777–785 (2016). https://doi.org/10.3233/978-1-61499-
621-7-777
3. Lintermann, A., G¨obbert, J.H., Vogt, K., Koch, W., Hetzel, A.: Rhinodiagnost:
morphological and functional precision diagnostics of nasal cavities. Innov. Super-
comput. Deutschl. 15(2), 106–109 (2017). http://juser.fz-juelich.de/record/840544
4. Lintermann, A., Meinke, M., Schr¨oder, W.: Fluid mechanics based classification
of the respiratory efficiency of several nasal cavities. Comput. Biol. Med. 43(11),
1833–1852 (2013). https://doi.org/10.1016/j.compbiomed.2013.09.003
5. Lintermann, A., Schr¨oder, W.: A hierarchical numerical journey through the nasal
cavity: from nose-like models to real anatomies. Flow Turbul. Combust. 101, 1–28
(2017). https://doi.org/10.1007/s10494-017-9876- 0
6. Petrova-ElSayed, M., Benedyczak, K., Rutkowski, A., Schuller, B.: Federated com-
puting on the web: the UNICORE portal. In: Proceedings of the 2016 39th Inter-
national Convention on Information and Communication Technology, Electron-
ics and Microelectronics, Opatija, Croatia, 30 May 2016–3 June 2016, pp. 190–
195. IEEE (2016). ISBN 978-953-233-086-1. https://doi.org/10.1109/MIPRO.2016.
7522133.http://juser.fz- juelich.de/record/820398