Content uploaded by Kenny Gruchalla
Author content
All content in this area was uploaded by Kenny Gruchalla on Oct 02, 2017
Content may be subject to copyright.
Coupling Visualization, Simulation, and Deep Learning for Ensemble
Steering of Complex Energy Models
Brian Bush, Member, IEEE Nicholas Brunhart-Lupo Bruce Bugbee Venkat Krishnan
Kristin Potter Kenny Gruchalla, Senior Member, IEEE
National Renewable Energies Laboratory*
Figure 1: Immersive visualization of an ensemble of energy simulations supports a campus renewable energy design study.
ABSTRACT
We have developed a framework for the exploration, design, and
planning of energy systems that combines interactive visualization
with machine-learning based approximations of simulations through
a general purpose dataflow API. Our system provides a visual inter-
face allowing users to explore an ensemble of energy simulations
representing a subset of the complex input parameter space, and
spawn new simulations to “fill in” input regions corresponding to
new energy system scenarios. Unfortunately, many energy simula-
tions are far too slow to provide interactive responses. To support
interactive feedback, we are developing reduced-form models via
machine learning techniques, which provide statistically sound esti-
mates of the full simulations at a fraction of the computational cost
and which are used as proxies for the full-form models. Fast com-
putation and an agile dataflow enhance the engagement with energy
simulations, and allow researchers to better allocate computational
resources to capture informative relationships within the system
and provide a low-cost method for validating and quality-checking
large-scale modeling efforts.
1 INTRODUCTION
Ensemble simulations, namely simulation suites using multiple mod-
els with varying input parameters and initial conditions, are a com-
mon approach to understanding highly complex natural phenomena.
*
e-mail:
{brian.bush, nicholas.brunhart-lupo, bruce.bugbee,
venkat.krishnan, kristi.potter, kenny.gruchalla}@nrel.gov
These simulation collections combine different models and settings
to cover a range of possible outcomes and provide statistical mea-
sures indicating the similarity of individual model results. For these
types of simulations, a major challenge is in determining appropriate
parameter settings; often the number of parameters is quite large,
some settings may fail to produce realistic results, and the cost to
compute all parameter perturbations may be astronomical.
Often, a pre-defined set of initial conditions and parameter set-
tings is used, such as NOAA’s Short-Range Ensemble Forecast
(SREF) [14], but such an approach may not be ideal in other scenar-
ios. More robust solutions must include methods to select regions
within the parameter space that are of scientific interest and often
this requires a user-in-the loop interface to guide simulations, tuning
inputs within a specific range of inquiry.
To address these challenges, the National Renewable Energy Lab-
oratory (NREL) has developed a framework for visualization-driven
design, exploration, and analysis of energy simulations. The frame-
work uses what we are terming ensemble steering to provide an
overview of a simulation’s parameter space via a visual analysis
environment, and, based on the user’s interplay, spawn new simula-
tions to provide results fast enough to be interactive. In cases where
the simulation response time is too slow for interactivity, we develop
reduced-form models that approximate the full simulation model
to enable interactive sessions; offline simulation of the full model
may also proceed later, eventually producing more accurate results.
This ensemble steering and analysis environment allows users and
stakeholders to rapidly design alternative scenarios for simulation,
quickly view approximate results of those simulations, and refine
the design or explore the simulation results in depth.
The computational and visualization capabilities reside within
a dataflow architecture for connecting producers of multidimen-
sional timeseries data with consumers of that data. The architecture
is general-purpose, supporting a wide range of multivariate time-
varying data producers, including measurements from real-time
sensors and results from high performance computing (HPC) sim-
ulations, and supporting multiple concurrent consumers including
visualizations, statistical analyses, and datastores. Consumers can
request existing data records or can make a request for a non-existent
record, spawning a new simulation to satisfy that request.
2 ENSEMBLE STEERING FR AM EWORK
Computational steering is “the interactive control over a computa-
tional process during execution” [13], and allows a user to guide
computation toward interesting aspects and react to previous results.
Often this includes the ability to change or halt simulations while
they are running and much of the research in simulation steering is
on the interface between user and simulations [3, 16, 21]. Similar
to the work presented here, systems such as World Lines [20] inte-
grate visualization, simulation, and computational steering into a
single framework, allowing the user to investigate alternative sce-
narios. A main distinction of our system is rather than steering or
“nudging” [21] a simulation while being executed, we are using our
framework to explore the parameter space of an ensemble via simu-
lations running at an interactive pace, be it a full-scale simulation
or an approximate model, similar to the conceptual framework pro-
posed by Sedlmair et al. [17]. This approach quickly gives the user
an overview of the relationship between the parameter and output
spaces, allowing computational and time resources to be focused on
specific areas of interest. Our system can act as a frontispiece to the
full-scale simulation suite, by approximating results on the fly and
moving the computationally intensive aspects of simulations outside
of the traditional analysis workflow.
Our system is composed of three components developed in con-
cert, with workflow connections designed to be general purpose and
customizable. The dataflow API is the skeleton of the framework,
providing a highway between visual analysis, reduced-form models,
a datastore, and the HPC resource on which to spawn new simula-
tions. This design provides easy entry points for customization for
each domain scenario.
2.1 A Dataflow API for Multidimensional Time-series
Figure 2: Interaction diagram for discovering available simulation
models, spawning new simulations, and visualizing the results.
The dataflow API [1] normalizes interactions between producers
of multidimensional record-oriented data and consumers of such
data. In the context of the API, multidimensional data records are
defined as simple tuples consisting of real numbers, integers, and
character strings. Each data value is tagged by a variable name
according to a pre-defined schema, and each record is assigned a
unique identifier. Conceptually, these records are isomorphic to
rows in a relational database, JSON objects, or key-value maps. The
objective of this API is to unify the interactions between records
producers and consumers, with the idea being that any client using
this specification can speak to any server that implements the API.
The goal is to reduce the barriers of using this specification to a
minimum: therefore, we primarily specify the data transport layer
and messaging; storage, data structures, and other implementation
details are left to the developer. As the API is closely related to
common database models, most implementations merely need to
provide a translation between backend database storage and the API.
In order to maximize usage by energy researchers who may not
have extensive software engineering experience, this minimalist API
avoids imposing metadata, structural, or implementation require-
ments on developers by relying on open-source technologies that are
readily available for common programming languages.
The dataflow API is organized in a client-server model. Clients
ask for available datasets (e.g., simulation results), receive extant
data and any new records as they are generated, and, as needed,
ask for the simulation of new data based on user input. A server
may host multiple “models” (or tables, in database terms); a model
may hold static unchanging data, but the design places emphasis
on dynamic models, where records are being added continually,
such as the case of sensor measurements being collected as new
telemetry becomes available, or the generation of new simulation
results. New records are then provided as a notification to clients.
Following the pipeline model, dataflow API servers and clients can
be chained together, creating a transformation path for records or
even coupled models. Figure 2 shows a high-level view of the
desired communication protocol for the simplest visualization use
case. Separate server implementations of the API exist in C++ and
Haskell; client implementations currently exist for C++, Haskell,
JavaScript, Python, and R. Collectively, the servers support persistent
backends for delimited-text files, databases (PostgreSQL, MySQL,
SQLite3, and ODBC), and real-time sensor feeds (Haystack [9]).
The dataflow API also provides bookmarking, defined as a set
of records or a query (the database analogue being an SQL view)
that saves the current state of the environment. Bookmarks enable
a collaborative approach to data exploration and can be distributed
across connected clients. Researchers can share a bookmark to
explore the same results, a client can continually create bookmarks
of selected content so that the selection can be mirrored to another
user, or clients may watch new bookmarks for a certain tag, and
publish those results on a webpage.
Transport of the data is specified to take place over WebSock-
ets [10] which are ubiquitous and easily available to programmers of
most languages and provide a mechanism for poll-free notification
and large message sizes. The format chosen for the message bodies
is Google Protocol Buffers [8]. Encoders and decoders for messages
are automatically generated, reducing implementation effort and
ensuring message correctness. In Figure 3, we show an example de-
ployment structure and options of our Records API system. Though
we only specify communication between client and server, the server
itself has no restrictions on how it obtains data. Complex data har-
vesting systems can be completely abstracted away, providing a
uniform method of data access.
2.2 Deep Learning
Maximizing user engagement relies on simulations providing fast,
responsive results on a sub-minute timescale; a low level of latency
allows stakeholders to properly focus on exploration and inference
in a truly interactive manner. Currently, only a few of NREL’s large
energy models are fast enough to be used in this manner; most other
Figure 3: Structure of an example Records API deployment. Note that
the API only governs communication between remote procedure call
(RPC) endpoints (servers) and the clients, as shown in the left and
center columns of the diagram.
important simulation suites are too computationally costly. Many
of these models exhibit extensive regions of nearly linear behaviors,
punctuated by nonlinear transitions, jumps, or other critical phenom-
ena. Mapping locations of the quasi-linear and the nonlinear regimes
allows researchers to focus computation preferentially towards the
nonlinearities while not sacrificing coverage of the more linear por-
tions of parameter space. Simplified or reduced-form versions of
models allow analysts to carefully plan their computational exper-
iments with the full models, making far better use of computing
resources. To this end, we are developing reduced-form represen-
tations of computation-intensive energy models through machine
learning.
By framing the task of approximating energy simulations via
machine learning in a standard statistical framework, we have ac-
cess to a plethora of methods for learning maps between relevant
inputs and outputs. The choice and effectiveness of methods is
highly dependent on the structure of both input and output data.
For simple relationships, traditional regression methodologies such
as linear regression, mixed models, gradient boosting, and random
forests are competitive, particularly for one-dimensional output sce-
narios [11]. Longitudinal and functional analysis approaches are
applicable when the stated goal is to represent some combination of
input or output in functional form. Because the space of learning
methods is vast and results dependent on specific scenarios, we have
designed our framework to be general and reusable, allowing for the
development of targeted approximations. The quality and usefulness
of the approximation varies with the model being approximated and
the training dataset–this is one of our active research areas.
The primary objective of our approximate modeling is to achieve
some level of predictive accuracy coupled with fast evaluation. Thus,
we pay particular attention to recent advances in neural networks [7],
due to properties such as flexibility in handling highly nonlinear
relationships, advances in computational implementations, and the
ability to handle multidimensional output spaces. It is important to
note that while deep multilayer neural networks may take a long
time to train even with the modern GPUs available, this is a one time
cost. Predictive evaluation of these networks is fast since it mainly
requires efficient matrix multiplication and evaluation of nonlinear
activation functions.
2.3 Interactive Visualization and Analysis
The challenges of rapidly developing insights from NREL’s com-
plex flagship models do not simply end with interactivity. Another
substantial hurdle is abstracting features of interest from the high
dimensional input/output of the models: in general, these simulation
results contain lower dimensional geometric structures that have
clear and insightful meanings/interpretations. Techniques that iden-
tify and present such structures greatly speed the interpretation and
exploration of dauntingly complex simulation results.
To facilitate the parameter space exploration and feature identi-
fication, we have developed the ability to connect a variety of data
analysis environments to the dataflow API. The flexibility of our
framework allows the integration of generic visualization clients
such as R and Python Jupyter notebooks for quantitative analysis,
web applications such as Shiny and D3 for broad deployment, and
in-house tools developed for 3D immersive (i.e., head-tracked stereo-
scopic) environments. This multitude of visualization clients is an
important aspect of our ensemble steering framework, allowing its
use on different types of data and simulation scenarios. The sys-
tem provides an interface to explore the multivariate ensembles
as well as design new scenarios by manipulating input parameters.
Thus, analysts can quickly develop and test hypotheses regarding
the relationships between simulation inputs and outputs.
3 DISCUSSION OF APPLICATIONS
To date, we have used our ensemble steering framework to develop
customized workflows targeted at stakeholders exploring analytic
questions using multiple energy models. We demonstrate the use
of our framework on three examples: however, the development of
novel visualization techniques and reduced-form models is ongoing.
3.1 Renewable Energy Planning
NREL campus planners are using our ensemble steering framework
to evaluate the energy impacts of a wide range of planning scenarios.
Combining techno-economic optimizations from REopt [18] simula-
tions, whole building simulations from EnergyPlus [4], and power
flow simulations from OpenDSS [5] provides technical, economic,
and policy perspectives. Users can interactively manipulate on-site
power generation, electrical loads, and cost assumptions, thus pro-
viding a user-driven exploration of the parameter space. Figure 1
shows our immersive environment in which multiple stakeholders
can gather and evaluate planning scenarios by walking inside a vir-
tual campus, see the effects of various settings, and spawn new
simulation runs. In the figure, campus buildings are shown in dark
gray, and the lines are modulated by color and directional texture to
show power flow variables. This environment is currently used not
only by local planners to explore and estimate impacts of various
energy scenarios, but also by external entities as a way to understand
relationships within energy models, view changing variables within
the conceptual context of the simulations, and spark collaboration
for future projects. While in its early stages, we have already discov-
ered opportunities for energy systems integration on our campus by
bringing our site planners and leadership together in this environ-
ment, and have received requests to create similar models of other
sites.
3.2 Biomass Supply-Chains
Energy analysts and stakeholders at NREL actively use in-house
tools developed for the visualization of generic datasets of multi-
dimensional timeseries to explore results of biomass supply-chain
models such as the Biomass Scenario Learning Model (BSLM) [19],
the Biomass Scenario Model (BSM) [15], and a waste-to-energy sys-
tem simulation (WESyS). This suite of simulations uses the system
dynamics methodology to model dynamic interactions within the
supply chain: the models track the deployment of bioenergy given
current technological development and the reaction of the invest-
ment community to those technologies. Immersive scatterplots and
parallel planes [2] allow for the animated visualization of five to
twenty dimensions of such timeseries. Figure 4 shows an immersive
Figure 4: Parallel planes in an immersive virtual environment with
annotations describing the visualization and user interface [2].
parallel-coordinates display of variables from the BSLM scenario.
Users of these visualizations can effectively explore ensembles of
hundreds to tens of thousands of simulation results and interactively
create new simulations at the rate of several hundred per hour. In
contrast to the immersive visualization shown in Figure 1, these
visualizations are specifically aimed at researchers closely involved
with model design and development and thus variables are directly
rendered without any contextual representation. The immersive visu-
alizations streamline the simulation-analysis workflow by providing
a space for collaborators to collectively drive simulation studies.
Typically users alternate between hypothesis generation and hy-
pothesis testing; in the hypothesis generation phase they select, filter,
and brush the existing ensemble of simulations, while in the hypothe-
sis testing phase they create new simulations whose input parameter
sets they have tuned towards validating or falsifying the previous
hypothesis. Fortunately, the round-trip time for creating new BSLM
and WESyS simulations is less than ten seconds. In contrast, BSM
simulations take three minutes to complete, somewhat hampering
the user experience of visually responsive addition of the new en-
semble results, but also motivating our development and deployment
of reduced-form machine-learning models.
3.3 Electric Power System Capacity Expansion
The lack of immediate response from ensembles of simulations
spawned by a visualization user is even more extreme in models
like NREL’s Regional Energy Deployment System (ReEDS), where
each simulation in the ensemble takes five or more hours to com-
plete. The ReEDS model is an electricity system capacity expansion
model that develops scenarios of future investment and operation
of generation and transmission capacity to meet U.S. electricity de-
mand [6], representing the continental United States with a very
high spatial resolution [12] and performing a system-wide least-cost
optimization in two-year periods from 2010 to 2050.
Initial efforts have focused on creating reduced-form predictive
models for projected national capacity of a variety of resources. A
dense multilayer neural network is used to map from a set of fixed
category designations (demand scenario, utility-scale solar pene-
tration scenario, etc.) to the projected capacity measurement from
ReEDS. Figure 5 highlights a comparison between fully simulated
and reduced-form predicted results for a small subset of estimated
national wind capacities.The average percent deviation between
predicted and ReEDS wind capacity was approximately 3%. Pre-
liminary work on geothermal, coal, gas, and utility solar capacities
showed similar results. We aim to expand these results by augment-
ing the available input data to incorporate continuous and functional
metrics rather than hard-coded scenarios, thus allowing users to
explore new regions of the parameter space, either through mixing
existing scenarios or “drawing” new curves of input values.
Figure 5: Comparison of simulation and predicted results of the
ReEDS model. Each color represents a sampled ReEDS scenario
with solid lines corresponding to true output and dashed lines corre-
sponding to reduced-form predictions.
4 CONCLUSION
Large-scale ensemble simulations are state-of-the-art in many ap-
plication domains. Techniques allowing for the rapid display, un-
derstanding, and control of these simulations suites will become
increasingly necessary as models escalate in complexity and com-
putational needs. This work demonstrates a general application of
an ensemble steering framework to energy system models. Our
dataflow API allows users to explore and steer energy systems simu-
lation ensembles by coupling multiple reduced-form energy models
and interactive visualization via a dedicated data workflow, all to
provide a rich environment for engagement.
The availability of fast approximate models will greatly increase
the agility of users interacting with complex simulations. We foresee
future work in designing approximate models in conjunction with
full-scale models to facilitate stakeholder interactions, resulting in a
superior user experience. Combined with customized visualization
and an appropriate data workflow, this effort will collapse the time
required to develop and analyze scenarios by providing previews of
full model results and will likely be used in planning, quick-response,
and quality assurance activities.
While the development of reduced-form models is still underway,
and a full study on the application of learning methods to our scenar-
ios is beyond the scope of this paper (and will be a paper unto itself
in the near future), our initial results and the approximate nature of
our framework allude to an effective approach for dealing with the
challenges associated with the realtime ensemble steering of simula-
tions. Because the approximate models are simply used as guidance
for scientists to select regions in which to run full-scale simulation,
errors in the approximations will only lead to the wasted compu-
tation on a subset of the full simulation space, still a savings over
running the full space. In addition, as advances in machine learning
materialize, they can quickly be integrated into our system, thus
continuously improving the predictive power of our reduced-form
models and our ensemble steering framework.
ACKNOWLEDGMENTS
This work was supported by the U.S. Department of Energy (DOE)
and performed using NREL computational resources sponsored by
DOE’s Office of Energy Efficiency and Renewable Energy. This
work was supported by the Laboratory Directed Research and De-
velopment program at the National Renewable Energy Laboratory.
REFERENCES
[1]
N. Brunhart-Lupo, B. Bush, K. Gruchalla, and M. Rossol. Advanced
Energy System Design (AESD): Technical Manual for the Records
API. Technical Report TP-6A20-68924, National Renewable Energy
Laboratory, October 2017.
[2]
N. Brunhart-Lupo, B. W. Bush, K. Gruchalla, and S. Smith. Simulation
exploration through immersive parallel planes. In Immersive Analytics
(IA), 2016 Workshop on, pp. 19–24. IEEE, 2016.
[3]
D. Coffey, C.-L. Lin, A. G. Erdman, and D. F. Keefe. Design by drag-
ging: An interface for creative forward and inverse design with simu-
lation ensembles. IEEE Transactions on Visualization and Computer
Graphics, 19(12):2783–2791, December 2013. doi: 10. 1109/TVCG.
2013.147
[4]
D. B. Crawley, C. O. Pedersen, L. K. Lawrie, and F. C. Winkelmann.
Energyplus: Energy simulation program. ASHRAE Journal, 42:49–56,
2000.
[5]
R. C. Dugan and T. E. McDermott. An open source platform for
collaborating on smart grid research. In 2011 IEEE Power and Energy
Society General Meeting, pp. 1–7, July 2011. doi: 10.1109/PES.2011.
6039829
[6]
K. Eurek, W. Cole, D. Bielen, N. Blair, S. Cohen, B. Frew, J. Ho,
V. Krishnan, T. Mai, B. Sigrin, et al. Regional energy deployment
system (reeds) model documentation: Version 2016. Technical report,
NREL (National Renewable Energy Laboratory (NREL), Golden, CO
(United States)), 2016.
[7]
I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press,
2016.
[8]
Google Developers. Protocol Buffers.
https://developers.
google.com/protocol-buffers/, July 2017.
[9]
P. Haystack. Project Haystack.
http://project-haystack.org/
,
2017 July.
[10]
Internet Engineering Task Force. RFC 6455 – The WebSocket Protocol.
https://tools.ietf.org/html/rfc6455, 2017 July.
[11]
G. James, D. Witten, T. Hastie, and R. Tibshirani. An introduction to
statistical learning, vol. 112. Springer, 2013.
[12]
V. Krishnan and W. Cole. Evaluating the value of high spatial resolution
in national capacity expansion models using reeds. In Power and
Energy Society General Meeting (PESGM), 2016, pp. 1–5. IEEE, 2016.
[13]
J. D. Mulder, J. J. Van Wijk, and R. Van Liere. A survey of compu-
tational steering environments. Future generation computer systems,
15(1):119–129, 1999.
[14]
National Centers for Environmental Protections Environmental
Modeling Center. Short-range ensemble forecasting project.
http://wwwt.emc.ncep.noaa.gov/mmb/SREF/SREF.html.
[15]
S. Peterson, C. Peck, D. Stright, E. Newes, D. Inman, L. Vimmerstedt,
S. Hsu, and B. Bush. Overview of the biomass scenario model. Tech-
nical report, National Renewable Energy Laboratory (NREL), Golden,
CO., 2015.
[16]
H. Ribi
ˇ
ci
`
c, J. Waser, R. Gurbat, B. Sadransky, and M. E. Gr
¨
oller.
Sketching uncertainty into simulations. IEEE Transactions on Visual-
ization and Computer Graphics, 18(12):2255–2264, December 2012.
doi: 10.1109/TVCG. 2012.261
[17]
M. Sedlmair, C. Heinzl, S. Bruckner, H. Piringer, and T. Mller. Visual
parameter space analysis: A conceptual framework. IEEE Transactions
on Visualization and Computer Graphics, 20(12):2161–2170, Dec
2014. doi: 10. 1109/TVCG.2014. 2346321
[18]
T. Simpkins, D. Cutler, K. Anderson, D. Olis, E. Elgqvist, M. Callahan,
and A. Walker. Reopt: A platform for energy system integration and
optimization. Technical Report CP-7A40-61783, 2014.
[19]
L. Vimmerstedt, B. W. Bush, and S. O. Peterson. Dynamic modeling
of learning in emerging energy industries: The example of advanced
biofuels in the united states. In The 33rd International Conference of
the System Dynamics Society, Cambridge, Massachusetts, USA, 2015.
[20]
J. Waser, R. Fuchs, H. Ribi
ˇ
ci
`
c, B. Schindler, G. Bl
¨
oschl, and M. E.
Gr
¨
oller. World lines. IEEE Transactions on Visualization and Computer
Graphics, 16(6):1458–1467, 2010.
[21]
J. Waser, H. Ribi
ˇ
ci
`
c, R. Fuchs, C. Hirsch, B. Schindler, G. Bl
¨
oschl, and
M. E. Gr
¨
oller. Nodes on ropes: A comprehensive data and control flow
for steering ensemble simulations. IEEE Transactions on Visualization
and Computer Graphics, 17(12):1872–1881, December 2011. doi: 10.
1109/TVCG.2011. 225