PreprintPDF Available

Free and Open Source Software for Computational Chemistry Education

Authors:

Abstract

Long in the making, computational chemistry for the masses [J. Chem. Educ. 1996, 73, 104] is finally here. Our brief review on free and open source software (FOSS) packages points out the existence of software offering a wide range of functionality, all the way from approximate semiempirical calculations with tight-binding density functional theory to sophisticated ab initio wave function methods such as coupled-cluster theory, covering both molecular and solid-state systems. Combined with the remarkable increase in the computing power of personal devices, which now rivals that of the fastest supercomputers in the world in the 1990s, we demonstrate that a decentralized model for teaching computational chemistry is now possible thanks to FOSS packages, enabling students to perform reasonable modeling on their own computing devices in the bring your own device (BYOD) scheme. FOSS software can be made trivially simple to install and keep up to date, eliminating the need for departmental support, and also enables comprehensive teaching strategies, as various algorithms' actual implementations can be used in teaching. We exemplify what kinds of calculations are feasible with four FOSS electronic structure programs, assuming only extremely modest computational resources, to illustrate how FOSS packages enable decentralized approaches to computational chemistry education within the BYOD scheme. FOSS also has further benefits driving its adoption: the open access to the source code of FOSS packages democratizes the science of computational chemistry, and FOSS packages can be used without limitation also beyond education, in academic and industrial applications, for example.
Free and Open Source Software for Computational Chemistry Education
Susi Lehtola1, a) and Antti J. Karttunen2
1)Molecular Sciences Software Institute, Blacksburg, Virginia 24061, United States
2)Department of Chemistry and Materials Science, Aalto University, Espoo, Finland
Long in the making, computational chemistry for the masses [J. Chem. Educ. 1996, 73, 104] is finally here.
Our brief review on free and open source software (FOSS) packages points out the existence of software
offering a wide range of functionality, all the way from approximate semiempirical calculations with tight-
binding density functional theory to sophisticated ab initio wave function methods such as coupled-cluster
theory, covering both molecular and solid-state systems. Combined with the remarkable increase in the
computing power of personal devices, which now rivals that of the fastest supercomputers in the world in
the 1990s, we demonstrate that a decentralized model for teaching computational chemistry is now possible
thanks to FOSS packages, enabling students to perform reasonable modeling on their own computing devices
in the bring your own device (BYOD) scheme. FOSS software can be made trivially simple to install and
keep up to date, eliminating the need for departmental support, and also enables comprehensive teaching
strategies, as various algorithms’ actual implementations can be used in teaching. We exemplify what kinds
of calculations are feasible with four FOSS electronic structure programs, assuming only extremely modest
computational resources, to illustrate how FOSS packages enable decentralized approaches to computational
chemistry education within the BYOD scheme. FOSS also has further benefits driving its adoption: the open
access to the source code of FOSS packages democratizes the science of computational chemistry, and FOSS
packages can be used without limitation also beyond education, in academic and industrial applications, for
example.
CONTENTS
I. Introduction 1
II. Free and open source software 3
A. Definitions 3
B. Why is free/open-source software not the
default? 3
1. Code distribution 3
2. Maintenance and user support 4
3. Linux distributions 4
4. Case study: Libxc library of density
functional approximations 5
C. What does free and open source software
offer for teaching? 5
1. Free redistribution: install and
maintenance 5
2. Access to source code 6
3. Sophisticated workflows 6
D. Why would it be timely to switch to
free/open-source software? 6
III. Overview of available FOSS program
packages 7
A. Programs for molecular calculations with
Gaussian basis sets 8
B. Programs for solid-state calculations 9
C. Programs relying on fully numerical
representations 10
D. Programs employing semiempirical models 10
E. Limited-scope projects 11
a)Electronic mail: susi.lehtola@alumni.helsinki.fi
1. Keys to modular design 11
2. Is modular design a limitation? 11
3. The importance of interoperability 11
4. The move to increased modularity 12
5. Visualization, manipulation and analysis 13
IV. Illustrations of feasible computations 13
A. xtb 13
B. NWChem 14
C. Psi4 16
1. Methylcyclohexane 16
2. Geometry of chromyl fluoride 18
D. Quantum Espresso 19
1. Optimal geometry 21
2. Band structure 21
V. Summary and conclusions 22
Supporting Information 22
Acknowledgments 22
References 23
I. INTRODUCTION
Quantum chemical research methods have been used
extensively in the chemical industry already for several
decades.1–4 In addition to the widespread use in indus-
try as well as in academia, quantum chemistry is also
utilized in chemical education to provide atomic-level
understanding of fundamental chemical concepts and
phenomena.5,6 For example, in undergraduate general
and organic chemistry curricula, students get hands-on
2
experience on concepts such as three-dimensional molec-
ular structure, structural isomerism, conformers, and
stereochemistry by means of computational exercises or
computer laboratory sessions.7–9
Although some of the aforementioned aspects can
in principle be studied even with simpler methodolo-
gies such as classical force fields, quantum chemical
calculations with state-of-the-art software packages al-
low students to get firsthand understanding on more
advanced topics such as molecular orbitals, chemi-
cal bonding, energetics,10 thermodynamics,11,12 reaction
mechanisms,13 and various spectroscopies.14–18
The ability to interpret and understand chemical phe-
nomena with the help of quantum chemical calculations is
a valuable skill in every chemist’s professional life: nowa-
days, a significant portion of even the experimental stud-
ies reported in the chemical literature are tightly inte-
grated with quantum chemical investigations. Moreover,
as quantum chemistry is the critical bridging component
between experimental work and machine learning meth-
ods, the ability to run quantum chemical calculations can
be expected to become even more increasingly relevant
and necessary to work-life in the near future.
Although computational chemistry for the masses—
a pervasive inclusion of computational modeling in the
chemistry curriculum—has been long thought to be
coming,19 it does not appear to have arrived yet. In their
recent overview, Grushow and Reeves20 have summa-
rized some select landmarks in computational chemistry
education. At the same time, Grushow and Reeves note
how computational chemistry still has a somewhat lim-
ited presence in undergraduate curriculums, which can
be attributed at least in part to the history of computa-
tional chemistry software.
In the 1990s, commercial software companies started
selling graphical user interfaces to their quantum chem-
istry packages, some of which were particularly geared
towards educational use. Such software was and still is
typically used in a computer classroom setting, where a
limited number of relatively powerful desktop comput-
ers are available for the students during the teaching ses-
sions. The benefit of a computer classroom setting is that
all software can be pre-installed for the students and the
standardized software environment makes the possibili-
ties (and limitations) of the software setup clear for the
teachers in charge of the educational content. However,
the computer classroom approach has limited scalabil-
ity, as the number of students is limited by the number
of workstations; this often makes the approach impracti-
cal for large-scale undergraduate teaching. Furthermore,
while the computer classroom setting may be useful for
teaching during contact sessions, the students’ possibili-
ties for running calculations outside the contact sessions
are limited by the requirement of physical access to the
computer classroom—which has proved to be challenging
especially during the ongoing global coronavirus disease
pandemic which has required social distancing. Lastly,
the classroom setting typically limits the teacher and stu-
dents to the pre-installed software, while costs for the
required software licenses can be unfeasibly high for edu-
cational institutions with limited budgets. Someone also
has to maintain the software on the classroom computers
and ensure it is kept up to date.
In the early 2000s, the WebMO package introduced
a web-based approach to computational chemistry edu-
cation, in which the quantum chemistry software only
needs to be installed and maintained on a central server,
and the teachers and students can then access it through
a web browser interface.21,22 A number of quantum
chemistry software packages have been integrated with
WebMO whose integrated molecular editor and analysis
tools make it a rather low-barrier interface to quantum
chemistry. As the users thus only need a web browser to
access the computing software, WebMO was the first tool
to enable a bring your own device (BYOD) paradigm in
computational chemistry, in which the students can use
their personal devices to take part in the teaching.
However, WebMO still requires someone to set up and
administer the WebMO server, even though the need to
purchase actual server hardware has been removed by
the possibility of installing the service on cloud plat-
forms such as the Amazon Web Services or the Google
Cloud. Recently, the cloud-based Chem Compute plat-
form has also begun to offer web access to computational
chemistry software and computing resources for under-
graduate teaching and research without any cost to the
teachers,23 enabling such access for institutions that do
not have the personnel or financial resources to set up
their own physical or cloud servers; however, Chem Com-
pute relies on computational resources volunteered by
third parties whose continued future availability is not
guaranteed.
As discussed above, great advances like WebMO and
Chem Compute have been made in the direction of the
BYOD paradigm, to which many universities have al-
ready shifted in order to cut down on the costs associated
with the now-deprecated computer classroom model. In
this work, we will show that free and open source soft-
ware (FOSS) can be used in the context of the BYOD
paradigm to achieve computational chemistry for the
masses, all the while democratizing science by tearing
down established power structures and barriers for re-
search and education. (Inroads into BYOD in the context
of virtual laboratories have also been recently discussed
by Kobayashi et al.24 )
The layout of this work is as follows. In section II,
we will begin by defining what we mean by FOSS (Sub-
section IIA). Then, we discuss why FOSS has not been
the norm in science (Subsection IIB), what FOSS en-
ables for the teaching of computational chemistry (Sub-
section IIC), and why it would be a good time now to
switch over to FOSS in teaching (Subsection IID). We
present a brief overview of available FOSS packages in
section III, . We include several practical demonstra-
tions of using state-of-the-art FOSS programs for com-
putational chemistry education in section IV, showcasing
3
the kinds of calculations that are possible assuming only
limited computer resources. The article concludes in a
brief summary and discussion in section V.
II. FREE AND OPEN SOURCE SOFTWARE
A. Definitions
As our readers may not be familiar with the concept of
FOSS, some definitions are necessary before the present
discussion can take place. For the purposes of this article,
we will adopt three key criteria for FOSS:
1. The ability of anyone to freely use the software for
any purpose.
2. The ability to freely study the operation of the soft-
ware, and modify it at will.
3. The ability to freely redistribute copies of the
software—as well as modified versions thereof—to
others.
Consequently, any software that does not satisfy these
criteria for FOSS is referred to as proprietary or closed
source software.
What is the significance of these criteria? The first cri-
terion means simply that there can be no limitations on
potential uses of the software: for instance, in addition
to use in academic research and education, commercial
use must also be permitted by the license. Moreover, the
first criterion bars license terms that prohibit use of the
software for purposes deemed questionable by the licen-
sors, such as use in nuclear power plants or in research
on genetic engineering. FOSS can be used by anyone for
anything.
The second criterion means that the source code of the
software must not only be available, but also that cus-
tomizations to the source code must be allowed. This
is of major importance for developing new features or
computational models, for example. Being able to use
software written by other authors to accomplish certain
tasks eliminates the need to “reinvent the wheel” and
thereby results in faster scientific development.25 This
phenomenon has traditionally been the main enticement
of contributing to closed-source or “open teamware”26
packages, as access to their source code partly elimi-
nates the need to start from scratch, as algorithms im-
plemented in the package by its other contributors can
be leveraged to develop new computational models.
However, the control of access to the source code of
such closed-source programs lead to perpetuating power
structures and may inhibit academic collaborations be-
tween authors of different program packages,27 instead
of the Popperian ideal of science: the selfless pursuit
of truth,28 and a fair and unbiased competition of ideas
and methods in the context of computational chemistry.
Key persons in control of the access to the source codes
of various software packages are able to hold back eq-
uitable competition and collaboration between scientists
developing new methods and algorithms. The issue with
gatekeepers is not a new phenomenon: as was already
quipped by Max Planck, "A new scientific truth does not
triumph by convincing its opponents and making them
see the light, but rather because its opponents eventu-
ally die, and a new generation grows up that is famil-
iar with it"; this apt observation is supported by a re-
cent study that investigated the dynamics of scientific
evolution with the standard empirical tools of applied
microeconomics.29 This problem is less likely to manifest
in FOSS, as will be explained in the next paragraph.
The third criterion means that anyone who has a copy
of the software can redistribute it to others. One does not
need to ask case-by-case permission from the authors of
the software in order to share it with one’s collaborators
or the reviewers of a scientific paper, for instance. It also
means that anyone who has added new features to the
program can freely distribute their version. This elimi-
nates the problematic role of the gatekeepers in the “open
teamware” model, as alternative versions of the software
commonly known as forks can be distributed. It also
eliminates the possibilities of the infamous practice30 of
preventing one’s competitors from using one’s software,
which may have the result of hiding deficiencies and bugs
in one’s software. Case in point: the “war on supercooled
water”31 exemplifies the problems of having prominent
figures as exclusive gatekeepers. The “war” was only re-
solved once Princeton scientists gained access to their
Berkeley competitors’ source code and found a coarse
error in the Berkeley source code.32 Such problems are
much less likely to exist if FOSS is used, as FOSS pro-
grams are freely redistributable and can be thoroughly
inspected by anyone.
In our opinion, the three criteria laid out above con-
dense the essence of both the generally accepted 10-item
definition for “open source software” by the Open Source
Initiative33 as well as the four essential freedoms of “free
software” or “libre software” defined by the Free Software
Foundation.34 Note that there is a wide variety of FOSS
licenses that fit these criteria and that can be adopted by
software projects, and that new software projects should
choose their license with care.35 It is always easier to
switch to a more permissive license later on than to move
to a more restrictive license: any versions released under
a FOSS license will continue being FOSS in the future, as
well, even if newer versions switch to using a proprietary
license, for example.
B. Why is free/open-source software not the default?
1. Code distribution
The ideology of FOSS is in line with the demands of
science,36 as much like the Schrödinger or Dirac equation,
computational models should ideally always be publicly
4
available. Moreover, as the initial development and ongo-
ing use of most scientific software has been and continues
to be funded by public research funding, the results of
such work—the developed program source code—should
be available to everyone.
It is worthwhile to comment on the reasons for the
longstanding status quo. As discussed by Hinsen37 , be-
fore the advent of electronic computers, algorithms were
developed with pen and paper, and the traditional pa-
per journal article format is ideally suited to fully de-
scribe such algorithms. But, when implemented on a
computer, algorithms often become too complicated to
thoroughly describe in a journal article, and significant
portions of the implementation are always left out. As
this tacit information on what happens “under the hood”
of various computational chemistry packages is typically
passed only within the academic groups contributing to
those codes, lack of access to the source code creates an-
other barrier of entry for third parties, and again ends
up perpetuating established power structures.
However, nowadays there are well-established ways for
distributing scientific software. Version control systems
such as Git38 facilitate robust development of software,
which can be hosted at no cost on sites such as GitHub39
and GitLab40. GitHub and GitLab also enable a com-
munity approach to code development through the use
of public code review, which is leveraged by many pro-
gram packages to improve code quality and to decrease
the learning curve for potential new contributors to the
package. Stable releases of software can be made avail-
able on Open Science data repositories such as Zenodo41
with version-specific Digital Object Identifiers (DOIs).
Also precompiled versions can nowadays be easily dis-
tributed, as we will discuss in section III.
2. Maintenance and user support
A commonly referred impediment to FOSS in science
is that funding its maintenance and/or user support is
challenging.26,42,43 However, there are several companies
whose whole business model is founded on the use, devel-
opment, and support of FOSS. For instance, Red Hat Inc
broke $1 billion in annual revenue in 2012, and its revenue
has increased ever since, surpassing $3 billion in 2018.44
There is clearly money to be made in selling support
for FOSS. Moreover, in contrast to proprietary software,
maintenance and support for FOSS can be acquired from
third parties if the original author(s) are either unavail-
able or unwilling to support for their code; this is the key
to the Red Hat style business model.
The business model also works for scientitic FOSS. For
instance, Kitware Inc., established in 1998,45 has built
its business model around developing and supporting a
variety of scientific FOSS. Paraview46 and ITK47 en-
able modeling, visualization and data analysis for large
datasets, while the CMake build system has become a
quintessential tool for building scientific software.48 As
of 2022, Kitware has more than 200 employees and their
FOSS projects span many fields of science and technol-
ogy, including quantum chemistry.49
Due to the relatively small market for specialized scien-
tific software, the availability of public research funding
has always played a key role in the development of com-
putational chemistry software. Related to future devel-
opment of FOSS in science, the European Commission
has outlined Open Science as their policy priority and
the standard method of working under its research and
innovation funding programs.50
As evidenced by forums such as the Computational
Chemistry List51 and the present authors’ professional
experience, online peer-to-peer user support—whose mo-
tivations have been studied e.g. by Constant, Sproull,
and Kiesler52 —is invaluable even in the case of propri-
etary programs. In the case of FOSS, this peer-to-peer
support has an enhanced role, and is one of the keys be-
hind the success of FOSS.53 Because anyone can modify
the software and distribute modified copies thereof, any-
one can fix the bugs they run into, and gain fame even
for small contributions.
Importantly, the possibility to contribute bug fixes to
FOSS projects reduces the barrier between users and de-
velopers, and is the typical route how a project gains
new developers. The fostering of new developers can also
be greatly aided by practices such as open code review,
which serves a double purpose of both ensuring a top
quality code base and teaching both the new contributor
as well as any other project followers about the structure
and design philosophy of the project. This naturally also
leads to a more sustainable development environment,
since a constant influx of new developers is secured, and
enables expert knowledge (also known as tacit knowl-
edge) to be passed onto new members of the development
team.
Other aspects of the economic principles of FOSS
have also been studied extensively:54–67 FOSS is a pub-
lic good.55,56,68 Participation in the development and
support of FOSS has been found to be more motivat-
ing than that of proprietary software,69,70 and participa-
tion in FOSS projects is motivating and carries economic
benefits.71,72 FOSS promotes peer review, free exchange
of ideas, and maintainability,73 and competition of FOSS
packages promotes innovation.74
3. Linux distributions
The Linux operating system is a prime example of
FOSS. Originating from the University of Helsinki, Fin-
land, it is nowadays ubiquitous. It is used in billions of
mobile phones, laptops, workstations, as well as servers
and compute clusters all around the world. All super-
computers on the TOP500 list75 and the majority of the
world’s internet servers have run on Linux for a long time;
Android smartphones likewise run on Linux. Because
of Linux, proprietary operating systems have been ir-
5
relevant in high-performance computing for many years.
Chemists had good reasons to switch to Linux already
ages ago;76 the present authors have used Linux as their
main computational research platform for over 20 years.
A valuable feature of Linux distributions is that they
are usually cross-platform: in addition to the usual x86
and x86-64 platforms, consisting of processors by e.g.
the Intel Corporation and Advanced Micro Devices Inc.
(AMD), Fedora packages are also available on s390x pro-
cessors used on IBM mainframe computers and ARM
processors such as the ones used in Raspberry Pi and
new Mac computers, for instance. This versatility allows
the use of heterogeneous hardware and ensures seamless
compatibility even if students have dissimilar computing
devices at their disposal.
Several Linux distributions, such as Ubuntu, Debian,
and Fedora Linux have also solved the problem of ef-
ficient distribution of software decades ago. Our cri-
teria for FOSS in Subsection IIA allow such scientific
software to be packaged as part of Linux distributions,
and indeed several powerful program packages are al-
ready available as distribution packages thanks to the
grand entrance of FOSS software in quantum chemistry
in recent years. Some FOSS quantum chemistry pack-
ages like Erkale,77 Psi478 and its predecessor Psi379, and
PySCF80 have been developed in a fully free/open-source
development model since their beginning, while other
packages that originated within a closed-source licens-
ing model have also become open-sourced recently, such
as OpenMolcas,81 Dalton,82 and NWChem.83
4. Case study: Libxc library of density functional
approximations
An example of a successful scientific FOSS project
can be found in the Libxc library of density func-
tional approximations.84 The modular library currently
implements over 600 density functional approxima-
tions such as PBE,85 B3LYP,86 and SCAN,87 and is
used by over 30 electronic structure programs rang-
ing from programs using Gaussian basis sets (Erkale,77
Psi4,78, PySCF80 ) to plane-wave codes (ABINIT,88
INQ,89 Quantum Espresso90), finite element programs
(HelFEM91–94, DFT-FE95 ), and multiresolution adap-
tive grids (MADNESS96). In order to facilitate wider
use by the community, Libxc recently switched to a more
permissive FOSS license that allows the library to be
more easily included in closed-source programs. Libxc is
now used in several proprietary and commercial software
packages, e.g. the Slater-type orbital ADF package97,
and the Gaussian-type orbital GAMESS-US,98 Molpro,99
MRCC,100 ORCA,101 and TURBOMOLE102 programs;
several other packages are also contemplating to migrate
to Libxc.
The advantages of the community adoption of Libxc
are manifold. A new density functional approximation
only needs to be implemented in Libxc to become avail-
able in any of the electronic programs that support Libxc,
underlining the efficiency of the modular FOSS model.
Moreover, access to the same implementation of a density
functional approximation enables e.g. the study of repro-
ducibility across various numerical approaches,103 which
is important to be able to compare results obtained with
different methods or software packages. Indeed, economic
gains in terms of software development productivity and
product quality can be achieved by reuse of mature FOSS
components that are of the highest quality.104
We believe that computational chemistry will con-
tinue to transform by adopting more and more FOSS
components, the Electronic Structure Library (ESL) be-
ing one of the notable pushes in this direction.105 Well-
designed, modular FOSS components can be maintained
even by a single academic group; the semi-empirical dis-
persion library of the Grimme group is a successful recent
example.106–108 We will discuss this topic further in Sub-
section III E.
C. What does free and open source software offer for
teaching?
1. Free redistribution: install and maintenance
In addition to its benefits for general use cases,109
FOSS has three major advantages for teaching: the avail-
ability of the source code, the availability of precompiled
binaries, as well as the general applicability of the soft-
ware beyond academia. Starting out with the first ad-
vantage, software that satisfies the criteria for FOSS dis-
cussed in Subsection IIA can be redistributed, and in-
cluded in Linux distributions, for example. This greatly
facilitates the installation of these programs, as prepack-
aged software can be installed in a matter of minutes on
a wide range of hardware, ranging from students’ laptops
to compute servers, simply by running a single command,
or alternatively, finding the program in the distribution’s
graphical application manager and clicking on “Install”.
We wish to note here that although installing scientific
software by hand by compiling from source code affords
customized tunings that may result in faster operation,
that is, decreased runtimes of quantum chemistry pack-
ages, in many cases the gains realizable in computational
chemistry education or small-scale computing are rela-
tively modest and pale in comparison with the ease of ef-
fort afforded by the centralized packaging system. Com-
piling from source takes a lot of time as well as expertise,
and can lead to poor performance if the compiler options
are not adequately chosen; note that several proprietary
programs have likewise adopted a binary-only distribu-
tion model with the same limitations.
However, installation is only a part of the problem:
the software must also be kept up to date. This does
not happen automatically, and a constant level of ad-
ministration effort is then required to monitor new re-
leases, and to download and install new versions of the
6
software. In contrast, the Linux distribution packages
get automatically updated with the rest of the system
whenever new package versions come out: Linux package
managers not only handle updates to the Linux operat-
ing system kernel, but also all other software, such as the
internet browsers, the email clients, the office produc-
tivity software suites, the Fortran and L
A
TEX compilers,
and so on. Also computational chemistry packages get
automatically updated.
2. Access to source code
The second advantage of FOSS is that as the source
code is available, it can be used in teaching. For in-
stance, a course on electronic structure calculations can
exemplify the basic algorithms by showing how they are
implemented in an openly available program. Some codes
go even further: for instance, Psi4Numpy110 is a project
that aims to supply simple, easily modifiable Python al-
gorithms for educational and proof-of-concept purposes.
The PySCF quantum chemistry program80 makes it easy
to override and customize all algorithms, as they are
mostly written in Python. Similarly, DFTK111 has been
designed to facilitate algorithmic development and might
therefore also be useful for educational purposes.
Access to these kinds of projects not only facilitates
research in and development of new electronic structure
methods, but also means that teaching no longer has
to be limited to pen and paper exercises: instead, it
can also include real-life demonstrations. For example,
an advanced course on electronic structure theory could
involve asking students to write their own, customized
solver for self-consistent field theory.112
3. Sophisticated workflows
The third advantage of FOSS for teaching is that since
students (like anyone else) can access the full power
of various computational chemistry programs, they also
have the the possibility to develop more general tech-
nical skills such as programming and interfacing pro-
grams with each other, for instance by generating so-
phisticated workflows that automate complex tasks. Au-
tomated workflows are highly useful tools for practical
computations, as they can be leveraged to easily run and
analyze thousands to even millions of calculations that
are needed for high-performance screening of materials,
for instance. Several large-scale projects such as Mate-
rials Project,113 Materials Cloud,114 AiiDA,115 Atomic
Simulation Recipes,116 and QCEngine117 are FOSS and
provide immediate access to powerful automated work-
flows for computational chemistry. As was summarized
in the first criterion in Subsection IIA, FOSS can also be
freely used without limitations in industry to develop new
thermoelectric energy conversion materials118 or semi-
conductor devices,119 for example, underlining its free-
dom and flexibility.
D. Why would it be timely to switch to free/open-source
software?
We have argued above that FOSS has important ram-
ifications for the reproducibility of science and also has
several advantages for teaching. Although it is possible
to switch from proprietary programs to FOSS within the
traditional setup based on computer classrooms and/or
central compute servers, there is yet another important
aspect to consider: the BYOD approach discussed in sec-
tion I. In this section, we wish to examine FOSS from the
point of view of the ongoing paradigm shift to the BYOD
scheme.
As the price of laptop computers has dropped, many
students now bring their own devices to the classroom.
This paradigm shift has also affected university policies.
Students preferring to use their own devices has lead to
a significant decrease in the demand for computer class-
rooms. Universities may now find it cheaper to just offer
a laptop to all students. For instance, the Faculty of Sci-
ence of the University of Helsinki pivoted to such an ap-
proach several years ago. As a result, the university has
been able to cut down on computer classrooms that are
expensive to maintain even while several students refuse
the laptop offered by the university and opt to using their
private laptops instead.
Although as was already discussed in section I, a cen-
tralized compute server approach is compatible with the
BYOD paradigm, the effortless availability of FOSS pro-
grams can be used to finally bring computational chem-
istry to the masses and thereby truly democratize science.
As FOSS software packages can be made instantly avail-
able to everyone, the FOSS approach is ideally suited
for personal devices in the BYOD approach. Such a dis-
tributed approach is optimal also for massive open on-
line courses (MOOCs), as enrollment does not have to
be limited based on the available centralized computer
resources. Instead, the students can run all of the neces-
sary calculations on their own hardware.
Naturally, certain tradeoffs are implied in a course em-
ploying heterogeneous BYOD approaches, as one cannot
assume personal devices to have the same computational
power as purpose-built, dedicated compute servers. How-
ever, we argue that this is not much of an impediment due
to the immense developments in the speed of processors
and improved algorithms achieved during the past several
decades. A concrete example of this is the TOP500 list
of supercomputers, which contains almost 30 years worth
of data on the most powerful supercomputers in the
world.120,121 The estimated performance of the fastest
and slowest supercomputer on the list on a year-by-year
basis is shown in figure 1 in units of 109floating-point op-
erations per second (GFlops). Figure 1 also shows analo-
gous benchmark data for commodity hardware: a cheap
7
tablet computer with an Intel Celeron N4000 processor
and a high-end business laptop with an Intel i7-10610U
processor of one of the present authors (SL). A Rasp-
berry Pi 4 minicomputer was also assessed, and found to
perform similarly to the Celeron N4000 processor.
As figure 1 illustrates, personal devices have perfor-
mance in the tens to hundreds of gigaflops, which is com-
parable to the performance of fastest supercomputers of
the mid 1990s, or to the slowest supercomputer on the
TOP500 list in the mid 2000s. This amazing development
in computational power means that the content of classic
books on quantum chemistry such as Szabo–Ostlund122
could be reproduced nowadays on commodity hardware;
however, there’s no reason to, since better computational
methods and basis sets are available nowadays in many
FOSS packages. Many calculations could probably be
even carried out on an up-to-date smartphone!
The data in figure 1 suggest that a variety of calcula-
tions are possible within a reasonable time with personal
devices. Combined with FOSS program packages that
can be installed and kept up to date in a trivial fash-
ion with a package manager, computational chemistry
can finally be made available to the masses, as students
are able to run (and modify!) FOSS packages on their
own devices. The skills they gain doing so are directly
transferable to both research and industry, as the same
packages can also be used for heavy-duty calculations on
supercomputers which is also freely allowed by their per-
missive licenses.
III. OVERVIEW OF AVAILABLE FOSS PROGRAM
PACKAGES
This section presents an overview of available FOSS
program packages for computational chemistry. As the
number of FOSS projects has grown immensely in recent
years, we restrict the overview to self-contained packages
which are able to run quantum electronic structure cal-
culations from atomistic input. FOSS for other types of
molecular modelling has been discussed elsewhere,123,124
while various computational chemistry resources for ed-
ucation have been recently summarized by Rodríguez-
Becerra et al.125 .
As the availability of software is a moving goalpost,
since new packages appear and old ones become techno-
logically obsolete and stop being maintained, any review
can by force of necessity only represent the situation at a
given point in time. Continuously updated databases are
an alternative that is (hopefully) always up to date,126
but any observations made on their basis similarly are
tied to the time of observation and become outdated as
enough time passes. For this reason, new reviews are
typically published whenever the availability of software
has changed enough.
The main goal of this section is merely to illustrate
the breadth of software that is already available for use
in computational chemistry. We have assembled the col-
lection of packages by thorough literature and internet
searches. Because unmaintained packages are unlikely to
be easy to install, or to become available as prepack-
aged software, we limit the overview to software that
shows at least some development activity in recent years,
as checked from the upstream development repositories.
Even if it later turns out that we have missed some re-
cently published software package in this review, or if
some packages become replaced by newer competitors af-
ter the publication of this article, our main points should
remain unaffected: there will likely still be a similar
breadth of FOSS packages suitable for a variety of pur-
poses within computational chemistry and computational
chemistry education.
As FOSS, the programs listed here can be packaged
and distributed openly without restriction; several of
them are already available as part of Linux distributions
such as Debian, Ubuntu, and Fedora Linux. Linux dis-
tribution packages are centrally maintained by the Linux
distribution’s packagers, and require no special knowl-
edge or local department personnel to install them or
keep the software up to date, in contrast to typical pro-
prietary packages. As we show in the Supporting Infor-
mation, the packages can be installed on the command
line; alternatively, they can also be installed using the dis-
tribution’s application store. Importantly, the software
is also automatically kept up to date by the distribution
package manager, whereas the installation and upkeep
of proprietary packages tends to require significant local
expertise and time effort.
It is not even necessary to be running Linux to use such
prepackaged programs. Windows users can run the soft-
ware under the Windows Subsystem for Linux (WSL),
which allows installing and using a Linux distribution
easily inside Windows 10. The cross-platform Python
Package Index127 (PyPI) and Conda128 package man-
agers are other alternatives for easy access to an increas-
ing number of quantum chemistry packages on Linux,
Windows, and macOS. Computer laboratory settings can
also be imitated using pre-made, customized live CDs or
live USBs, for example.
Because of the large number of packages to review, we
organize the discussion into
programs for molecular calculations with Gaussian
basis sets, Subsection III A
programs for solid-state calculations with various
numerical approaches, Subsection IIIB
programs employing fully numerical methods, Sub-
section III C
programs employing semiempirical methods, Sub-
section III D
Due to space contraints, we only include minimalistic de-
scriptions of the programs, and advise the reader to look
up the programs’ evolving capabilities in detail on the in-
ternet to assess their usefulness for a given computational
8
Budget laptop, Celeron N4000
High-end business laptop, Core i7-10610U
101
1
10
102
103
104
105
106
107
108
109
GFlops
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
Year
Figure 1. The best-performing (red stars) and worst-performing (blue squares) supercomputer on the TOP500 list,121 as well
as the performance of a budget laptop with a Celeron N4000 processor and a high-end business laptop with a Core i7-10610U
processor (see supporting information). Note logaritmic scale on yaxis. The performance of Raspberry Pi 4 was found to be
similar to Celeron N4000.
chemistry course or other application. Most of the elec-
tronic structure programs support either Hartree–Fock
(HF) and/or density functional theory129,130 (DFT); sev-
eral molecular programs also support various post-HF
methods. We will also discuss projects of a more limited
scope in Subsection IIIE.
A. Programs for molecular calculations with Gaussian basis
sets
Gaussian basis sets dominate the field of quantum
chemistry, since all electrons can efficiently be included
in the calculation, the electronic Coulomb integrals can
be evaluated analytically in the Gaussian basis,131 and
the evaluation is efficient when recursion relations are
used.132,133 Thanks to many decades of work on the de-
velopment of Gaussian basis sets,134–136 basis sets exist
for the accurate reproduction of various molecular prop-
erties at several levels of theory. Access to analytical in-
tegrals greatly facilitates the implementation of post-HF
theories, and also guarantees accurate force and Hessian
evaluations.
Bagel137 is a C++ program package that features e.g.
analytical CASPT2 [complete active space pertur-
bation theory at the second order] nuclear energy
gradients and derivative couplings, relativistic mul-
tireference wave functions based on the Dirac equa-
tion, and implementations of novel electronic struc-
ture theories.
Chronus Quantum138 is a C++ program package that
focuses on the consistent treatment of time depen-
dence and spin in the electronic wave function, as
well as the inclusion of relativistic effects in said
treatments.
Dalton139 is a Fortran program that specializes in
molecular properties at various levels of theory,
such as frequency-dependent response properties;
one-, two-, and three-photon processes, etc. In
addition to HF and DFT, Dalton features sev-
eral post-HF methods like multiconfigurational self-
consistent field (MCSCF) theory and coupled-
cluster theory.
Ergo140 is a C++ program for linear-scaling HF and
DFT calculations for molecules.
ERKALE77 is a C++ program implementing HF and
DFT that specializes in the modeling of inelastic x-
ray spectroscopies, self-interaction corrected DFT,
as well as various orbital localization methods.
eT141 is a C++ program primarily aimed for coupled-
cluster calculations of molecular systems, which
9
specializes in multiscale and multilevel methods, as
well as modern Cholesky decomposition techniques
for two-electron integrals.
Fermi.jl142 is a Julia package for HF and post-HF cal-
culations.
JuliaChem143 is a Julia package for HF calculations.
LSDalton139 is a Fortran code targeted for linear-
scaling HF and DFT calculations on large molecu-
lar systems, and also includes some coupled-cluster
capabilities.
MolGW144 is a Fortran/C++ package that implements
HF and DFT, but specializes in many-body per-
turbation theory: the GW approximation and the
Bethe–Salpeter equation.
MPQC145 is a C++ program for massively parallel
quantum chemistry, which originally focused on HF
and DFT but has later evolved support for post-HF
many-body theories.
NWChem83 is a major quantum chemistry package
written in Fortran and has a variety of features for
both molecular and solid-state calculations.
Psi478 is a modular C++/Python package for HF, DFT
and various post-HF calculations that can be used
either as a traditional quantum chemistry package
with simple and intuitive input files, or as Python
modules for running calculations in Python.
PySCF80 is a collection of Python modules for elec-
tronic structure calculations with significant capa-
bilities also for solid-state simulations, including
e.g. coupled-cluster implementations for crystalline
systems.
PyQuante146 is a Python package for quantum chem-
istry with some C extensions that emphasizes ease
of understanding the code over performance.
OpenMolcas81 is a Fortran package that specializes in
multiconfigurational approaches to electronic struc-
ture theory, but also implements various DFT cal-
culations, for example.
Serenity147 is a C++ program for subsystem quantum
chemical methods.
SlowQuant148 is a Python program for molecular quan-
tum chemistry that derives its name from the use
of Python for even the computational demanding
parts of the program.
VeloxChem149 is a C++/Python package for molecular
properties and for modeling various spectroscopies
based on response theory.
Uquantchem150 is a Fortran 90 program written for
HF, DFT, Møller–Plesset perturbation theory, con-
figuration interaction singles and doubles, quantum
Monte Carlo, etc.
B. Programs for solid-state calculations
The major difference between solid-state and molecular
calculations is that the orbitals experience exponential
decay in molecular calculations, while solid-state calcu-
lations are performed on periodic crystals where the wave
function has to obey Bloch’s theorem.151 Because of the
periodicity, calculations in the solid state are in many
ways more difficult than those in molecules due to the
need of k-point sampling, for instance; see ref. 152 for
a recent introduction. Post-HF methods are much less
prominent in the solid state than in molecules. Instead,
calculations on solids are typically carried out with DFT
and pseudopotentials;153 pseudopotentials make the cal-
culations less costly while introducing an error which is
typically negligible compared to the error in the density
functional approximation itself.
The conventional way to model crystalline systems is
to use plane waves. However, many other numerical
schemes have also been pursued. Note that the pro-
grams listed here that employ (pseudo)atomic basis func-
tions can naturally handle periodicity in 0, 1, 2, or 3 di-
mensions, corresponding to atoms and molecules, chains,
sheets, and crystals, respectively. Still, we have listed
them as solid state codes because they are most often
used for calculations with DFT and pseudopotentials.
ABINIT88 is Fortran program for plane wave calcula-
tions that supports DFT as well as more advanced
formalisms like many-body perturbation theory.
ACE-Molecule154 is a C++ program that employs uni-
form real-space grids of Lagrange sinc functions and
pseudopotentials, and supports density functional
calculations on both periodic and non-periodic sys-
tems and wave function theory calculations based
on Kohn–Sham orbitals.
BigDFT155 is a Fortran program that is based on the
use of pseudopotentials and a two-tier Daubechies
wavelet basis to achieve a spatially localized basis.
Conquest156 is a Fortran program for large-scale DFT
calculations employing pseudo-atomic orbital basis
sets.
CP2K157 is a Fortran package based on Gaussian basis
sets specializing in solid state physics, implement-
ing HF, DFT, Møller–Plesset perturbation theory
and the random phase approximation.
DFTK111 or the density-functional toolkit is a collection
of Julia routines for experimenting with plane-wave
DFT that emphasises simplicity and flexibility in
the aim of facilitating algorithmic and numerical
developments and simplify interdisciplinary collab-
oration in solid-state research.
ELK158, EXCITING159, and FLEUR160 are For-
tran programs for linearised augmented-plane wave
10
calculations which can reach microhartree accurate
total energies for carefully chosen basis sets.
GPAW161 is Python/C electronic structure program for
DFT calculations within the projector-augmented
wave approach which supports three modes of op-
eration: (i) finite-difference grids, (ii) numerical
atomic orbitals, and (iii) plane waves.
INQ89 is a new, modular implementation of DFT and
time-dependent DFT written from scratch to work
on graphics processing units (GPUs).
JDFTx162 is a C++ plane wave DFT code aimed to be
easy to develop and easy to use, whose key feature
is support for joint DFT for the description of elec-
tronic systems in contact with molecular liquids.
M-SPARC163 is a MATLAB package for prototyping
DFT calculations employing finite-difference grids
and pseudopotentials.
Octopus164 is a Fortran program based on pseudopo-
tentials and finite difference grids that focuses on
time-dependent DFT for handling non-equilibrium
phenomena.
OpenMX165 is a C package for DFT calculations with
pseudopotentials and numerical atomic orbitals.
PARSEC166 is a Fortran program based on finite-
difference grids for density functional calculations
with pseudopotentials.
PWDFT.jl167 is a Julia package written from scratch
to facilitate development of novel computational
methods using plane waves.
RMG168 is a C++/Fortran program employing real
space grids and multigrid algorithms for density
functional calculations with pseudopotentials.
Siesta169 is a Fortran program for electronic structure
calculations and ab initio molecular dynamics of
molecules and solids that employs a basis set of nu-
merical atomic orbitals, which are strictly localized,
enabling the use of sparsity.
Qbox170 is a C++ program aimed for first principles
molecular simulations using plane waves and pseu-
dopotentials.
Quantum Espresso90 is a Fortran/C program for
plane wave calculations with pseudopotentials on
a wide range of hardware from laptops to super-
computers.
SPARC171 is a C program for parallel DFT calculations
employing finite-difference grids and pseudopoten-
tials.
C. Programs relying on fully numerical representations
The idea in modern fully numerical methods is to rep-
resent the orbitals directly in real space, and to use a
representation of non-uniform accuracy (more grid points
near the nuclei and fewer points in empty regions of the
system) so that all-electron calculations become feasible.
Although fully numerical approaches have a long history
for calculations on atoms and diatomic molecules,172 they
are otherwise a relatively recent development in elec-
tronic structure theory and have only recently become
competitive with e.g. Gaussian-basis calculations when-
ever high accuracy is needed.173
DFT-FE95 is a C++ program that employs spectral
finite-element basis sets for a local real-space vari-
ational formulation of DFT, and is able to han-
dle pseudopotential and all-electron calculations
within the same framework and arbitrary period-
icity.
HelFEM is a C++ program for fully numerical calcu-
lations on atoms92,94 and diatomic molecules91 at
the HF or DFT levels of theory employing high-
order numerical basis functions and yielding fully
variational energies.
MADNESS174 is a C++ program that relies on the use
of multiresolution adaptive grids, which has been
used in a variety of studies on novel real-space ap-
proaches to electron correlation, for instance.
MRChem173 is a C++ program that also relies on mul-
tiresolution adaptive grids for Hartree–Fock and
density functional calculations of molecules; its spe-
cialty is the computation of magnetic properties
such as nuclear magnetic shielding constants.
x2dhf175 is a Fortran program for non-relativistic fi-
nite difference restricted open-shell Hartree–Fock
and density functional calculations on diatomic
molecules.
D. Programs employing semiempirical models
Semiempirical models offer affordable techniques for
approximate quantum mechanical calculations that fall
in accuracy in-between ab initio density functional cal-
culations and force field techniques. Tight-binding
DFT176–178 is probably the best-known semiempirical
model, and it is available in several program packages.
Other types of semiempirical methods exist as well,
please refer to Thiel179 and Bannwarth et al.180 for dis-
cussion.
DFTB+181 is a Fortran package for various calculations
based on tight-binding DFT.
Latte182 is a Fortran program for tight-binding DFT
molecular dynamics.
11
Sparrow183 is a C++/Python program for fast semiem-
pirical quantum chemical calculations, including
tight-binding DFT.
xtb180 is a Fortran package that implements various
semiempirical eXtended Tight-Binding methods.
E. Limited-scope projects
Although the main focus of our review is on self-
contained packages for quantum electronic structure cal-
culations for computational chemistry education, this
narrow scope risks not seeing the forest from the trees.
The major part of FOSS—the forest in the analogy—
is a huge thriving ecosystem of small projects with lim-
ited scope, which wildly outnumber the more conspicuous
large program packages—the trees—which exist in syn-
ergy with the smaller projects: the smaller subprojects
are often used by the larger programs. Thereby, in order
to gain a thorough overview of FOSS it is invaluable to
extend our review from the self-contained packages re-
viewed above to projects of a more limited scope which
often have little user visibility.
The proliferation of small projects has multiple raisons
d’être. The most common one is simply a specific per-
sonal need. The good news is that because of the lim-
ited effort required to develop and maintain a code with
a well-defined scope, they can be developed and main-
tained by a single research group, or often even by a
single person. The bad news is that probably the major-
ity of all FOSS projects in existence are unmaintained,
simply because the authors moved on to other things. As
was already mentioned in the beginning of section III, we
have not considered such projects in this review.
1. Keys to modular design
There is a systematic reason for the origin of the spe-
cific personal need mentioned in the previous paragraph:
the DRY [Don’t Repeat Yourself] and KISS [Keep It Sim-
ple, Stupid!] principles, which have been key principles
in software engineering for an extended time and are still
used to teach programming.184
DRY is a reminder to avoid code duplication: a given
functionality should only be programmed once and that
implementation called everywhere it is needed, instead of
repeating the same functionality in several places of the
program. The latter approach would be more verbose,
making it less maintainable and more prone to bugs.
In KISS, a complex problem is broken down into
smaller subtasks. Once the subtasks—the common pieces
of the problem—have been identified, the principle is
reapplied to the subtasks themselves: can they be broken
down to a compact collection of even simpler tasks?
Once a KISS design has been established, each compo-
nent has a clear role in the design of the whole program.
Even though achieving the best design may in reality re-
quire several iterations of refactoring (restructuring) the
code, the effort in each iteration of the refactor is lim-
ited because even the code one is starting with should be
quite simple if the initial application of KISS was even
partly successful.
2. Is modular design a limitation?
A well-made design is like a puzzle: each software com-
ponent fills in a piece of the puzzle by carrying out a
small, well-defined task. Each piece should ideally be so
small that a working implementation can be developed
in a matter of hours.
The first attempt at the design of the program layout
is often not fully successful, because the structure of a
scientific problem is not always clear before it has been
fully solved. For this reason, program structures tend to
develop over time.
If a redesign of the modular structure of a problem
leads to a more elegant or efficient implementation, it is
often adopted in a new version of the software. Such re-
designs are extremely common in software development,
and are the reason for versioning software: the major
version changes whenever the interface becomes incom-
patible with the older version.185 However, the redesign
is often achievable through simple reorganizations of the
earlier code base. The software does not have to be
rewritten, as the existing pieces can just be rearranged
to fit the new pattern.
If the design of a modular library changes enough, it
can essentially become a wholly new library. In this case,
migrating to the newer version of the library may be a
significant task for other projects, and the old and the
new version of the library may coexist for an extended
time. A good example in the field of quantum chemistry
is the libint library of two-electron integrals,186 which is
used by several FOSS codes. A new major version of the
library was introduced in 2014 to take advantage of the
new features afforded by modern processors, but many
quantum chemistry programs still use the original ver-
sion published in the early 2000s, since the functionality
provided by the older version suffices for the purposes it
was designed for.
3. The importance of interoperability
An example of a modular design that has stood the
test of time is the Basic Linear Algebra Subprogram
(BLAS) library, which was originally introduced in the
late 1970s.187 BLAS implements elementary linear alge-
bra operations, such as adding, scaling, and multiplying
vectors and matrices; operations which hold a central
place in most branches of computational science, includ-
ing quantum chemistry, much of which is linear algebra.
12
Although a simple for-loop based implementation of
BLAS operations, such as matrix-matrix multiplication
Cik =PjAij Bjk can be written up in minutes, the
mathematical structure of the problems can be employed
to design a faster implementation. In a later step,
the implementation can even be hand-optimized to the
specific processor used in the machine; competing op-
timized BLAS implementations are an active area of
research.188,189
Although BLAS was published well before the FOSS
movement gained steam via the internet, it serves as an
excellent example of what can be achieved by the use of
open source, or at least by sharing a common program-
ming interace. BLAS is so pervasive, since it is ubiqui-
tous: everyone uses it, and there are many competing
implementations. When individual projects are interop-
erable, such as in the case of BLAS, the development of
efficient programs is greatly hastened. Simply by using
an optimized BLAS library instead of the reference im-
plementation can in many cases yield speedups of several
orders of magnitude.
Unfortunately, interoperability is still hampered in the
field of quantum chemistry since components are not
truly interoperable due to the lack of common standards.
The evaluation of two-electron integrals is a good ex-
ample: it is the rate determining step in conventional
Hartree–Fock calculations, and several implementations
of two-electron integrals have been published.186,190–192
However, these implementations do not share a com-
mon interface. Instead, the interfaces tend to reflect
the structure of earlier legacy codes that have a large
number of differring conventions on the ordering, nor-
malization, and signs of Gaussian basis functions, for in-
stance. Despite some attempts,193,194 two-electron inte-
grals libraries—or quantum chemistry programs, for that
matter!—are still not interoperable.
4. The move to increased modularity
The situation may, however, be slowly changing.
Libxc84 has already standardized density functional
calculations in over 30 electronic structure programs;
XCFun195 is another implementation of density func-
tional approximations like Libxc that has also been
adopted by many codes, several of which support both
Libxc and XCFun. Other types of libraries are also fol-
lowing suit. There is a growing ecosystem of modu-
lar electronic structure libraries as recently discussed by
Oliveira et al.105 in the scope of solid state calculations.
We will complement it with a brief overview of some mod-
ular open source projects that have become used within
several quantum chemistry programs below. The use of
common implementations will hopefully lead to more in-
teroperability between electronic structure programs also
in other aspects.
Given the multitude of small libraries that are avail-
able, the listing in this subsection is likely far from com-
plete; however, its goal is merely to illustrate that there
is more to FOSS than the self-contained packages listed
above. Specialized projects like these eliminate redun-
dant work and enable rapid implementation of new fea-
tures in quantum chemistry programs.
Polarization, embedding and quantum chemical mod-
els are a good example of modular functionality, since
the data structures needed to implement such models fit
well in the modular design. Examples of such projects
include:
CheMPS2196 is an implementation of the density ma-
trix renormalization group method.
cppe197 is an implementation of polarizable embedding.
DFT-D3198 and DFT-D4107 are implementations of
semiempirical dispersion corrections for density
functional calculations.
libefp199 is an implementation of the effective fragment
potential method.
Libxc84 contains implementations of density functional
approximations which have been generated with
computer algebra.
PCMSolver200 is an open-source library for the polar-
izable continuum model electrostatic problem.
XCFun195 contains implementations of density func-
tional approximations which employ automatic dif-
ferentiation.
There are also several projects that specifically deal with
Gaussian basis sets and that are thereby used by several
quantum chemistry codes.
The Basis Set Exchange201 is a Python library for
storing and managing Gaussian basis sets and con-
verting basis sets between various program for-
mats; the project also has a web interface at http:
//www.basissetexchange.org which will be more fa-
miliar to most readers.
erd190 computes two-electron integrals with Rys quadra-
ture.
libint186 is a library for the evaluation of molecular inte-
grals of many-body operators over Gaussian func-
tions employing Obara–Saika recursion routines.
libcint191 is an integral library for automatically imple-
menting general integrals for Gaussian-type scalar
and spinor basis functions using Rys quadrature.
simint192 is a vectorized library for electron repulsion
integrals employing Obara–Saika recursions.
libecpint202 is a software library for evaluating effective
core potential integrals.
13
5. Visualization, manipulation and analysis
The visualization, manipulation, and analysis tools dis-
cussed in this subsection are user-facing programs and are
thereby a more visible showcase of limited-scope projects
than the lower-level libraries that were discussed in Sub-
section III E4. Indeed, simplified frontends are often in-
valuable for initializing, visualizing and analyzing calcu-
lations. Several FOSS packages with graphical user in-
terfaces are also available for this purpose; some even
come with integration with FOSS electronic structure
programs that allow running calculations within a graph-
ical interface. For creating models and visualizing com-
putational results, FOSS graphical user interfaces such
as Jmol203, Avogadro204, IQmol205 and PyMol206 can be
installed and used.
Unfortunately, the interoperability challenges men-
tioned in Subsection III E affect visualization and anal-
ysis tools especially acutely, because these applications
tend to require access to the electronic wave function,
for which no universally accepted standard exists. This
problem plagues the whole field of computational chem-
istry, affecting both FOSS and proprietary programs. In
the lack of a universal standard, the interconversion of
various input and output file formats between different
programs can be carried out for example with the Open
Babel207 and cclib208 packages.
The Atomic Simulation Environment (ASE)209 con-
tains versatile tools for building molecular and periodic
models and enables easy retrieval of molecular structures
from structural databases such as PubChem.210 It can
also act as a frontend to several quantum chemical pro-
grams, thus offering a unified interface.
Calculations can be postprocessed with the
Multiwfn211 and ORBKIT212 packages, for instance,
which both support several file formats.
IV. ILLUSTRATIONS OF FEASIBLE COMPUTATIONS
To enable a practical demonstration of the BYOD
paradigm within computational chemistry education, it
is time to illustrate the easy access to several powerful
FOSS quantum chemistry packages in two widely used
Linux distributions, Fedora and Ubuntu. The Support-
ing Information contains practical step-by-step exam-
ples of combining the BYOD paradigm with FOSS pack-
ages to run quantum chemical calculations according to
the BYOD-FOSS paradigm. Four program packages are
used in the practical illustrations: xtb (Subsection IV A),
NWChem (Subsection IVB), Psi4 (Subsection IV C), and
Quantum Espresso (Subsection IVD). Installation in-
structions are provided for each code and all examples
can be run under Linux, macOS, or the Windows Sub-
system for Linux. In all cases, the software can be in-
stalled in a matter of minutes on a personal computer,
either using a Linux distribution package manager or the
Conda package manager. For convenience, the Support-
ing Information is also available as a git repository.213
A. xtb
The primary design goal of xtb has been the fast cal-
culation of structures and noncovalent interaction en-
ergies for molecular systems with up to roughly 1000
atoms.180,214 The GFNn-xTB methods implemented in
xtb are semiempirical quantum chemical methods180
parametrized for the whole periodic table up to radon
(Z= 86). A highly attractive feature of xtb is its per-
formance: calculations on small molecules (10–20 atoms)
finish in matter of seconds even on a low-performance
laptop computer. xtb is a powerful tool in the pre-
optimization of geometries and molecular conformations
before computationally more demanding calculations, for
instance; see ref. 215 for a recent application to water ox-
idation catalysis.
The Supporting Information includes step-by-step
guidelines for installing xtb and using it to study struc-
tures, conformations, energetics, and molecular orbitals
of inorganic and organic molecules. Calculations on phar-
maceutically relevant cisplatin and transplatin molecules
shown in figure 2 are briefly summarized here to show-
case the basic use of xtb. Cisplatin, cis-[Pt(NH3)2Cl2],
is a chemotherapy medication used in cancer treatments
whose stereoisomer, transplatin, trans-[Pt(NH3)2Cl2], is
ineffective in cancer treatment.
The Pt(II) atom is square-planar coordinated in both
cisplatin and transplatin. Which configuration, cis or
trans, is lower in energy? We use the xtb program to
answer this question. The first task is to have initial
geometries for the two molecules. In general, initial ge-
ometries can be obtained from structural databases such
as Pubchem;210 built in a graphical user interface with
programs such as Jmol, Avogadro, or IQMol; or built by
hand in internal coordinates (bond lengths, angles and
dihedrals) in the Z-matrix formalism, for example. Hand-
built molecular geometries for cisplatin and transplatin
are given in XYZ format in figures 3 and 4, respectively.
While these geometries should be sufficiently close to op-
timal to allow for a straightforward optimization without
difficulties, they are still quite rough in that the total en-
ergy is expected to change by several millihartrees in the
geometry optimization, corresponding to changes in the
energy of several kcal/mol.
The next step is to bring both molecules into a (local)
minimum of the potential energy surface (PES) by opti-
mizing the geometries with xtb. The point groups of the
initial geometries are approximately C2vand C2hfor cis-
platin and transplatin, respectively, but symmetry is not
enforced during the xtb optimizations. The only input
needed by xtb in this case are the cartesian coordinates
of both molecules in XYZ format, which were given in
figures 3 and 4 for cisplatin and transplatin, respectively.
The geometry optimizations complete in seconds even
14
Figure 2. Cisplatin (left) and transplatin (right). Color coding: Pt = gray, Cl = green, N = blue, and H = white.
11
cis-[Pt(NH3)2Cl2] (cisplatin); angstrom units
Pt 0.00000000 -0.00000000 -0.19134710
Cl 0.00000000 1.61220407 1.42085566
Cl 0.00000000 -1.61220407 1.42085566
N 0.00000000 1.40714181 -1.59849021
H 0.81649658 1.30951047 -2.16752575
H -0.81649658 1.30951047 -2.16752575
N 0.00000000 -1.40714181 -1.59849021
H -0.81649658 -1.30951047 -2.16752575
H 0.81649658 -1.30951047 -2.16752575
H 0.00000000 2.30951093 -1.16752621
H 0.00000000 -2.30951093 -1.16752621
Figure 3. Molecular geometry of cisplatin in XYZ format.
11
trans-[Pt(NH3)2Cl2] (transplatin); angstrom units
Pt 0.00000000 0.00000000 0.00000000
Cl 2.27999997 -0.00036653 0.00000000
Cl -2.27999997 0.00036653 0.00000000
N -0.00031991 -1.98999997 0.00000000
H 0.46944690 -2.32340883 -0.81740913
H 0.46944690 -2.32340883 0.81740913
N 0.00031991 1.98999997 0.00000000
H -0.46944690 2.32340883 -0.81740913
H -0.46944690 2.32340883 0.81740913
H 0.94318252 2.32318174 0.00000000
H -0.94318252 -2.32318174 0.00000000
Figure 4. Molecular geometry of transplatin in XYZ format.
on a low-performance computer; the supporting infor-
mation (SI) contains all of the necessary inputs. For cis-
platin, the optimized Pt–Cl and Pt–N distances are 2.24
Å and 2.15 Å, respectively. Considering the relatively low
level of theory, the obtained distances are in reasonable
agreement with the Pt–Cl and Pt-N distances of 2.25 Å
and 2.06 Å, respectively, obtained with the much higher-
level methods of Tasinato, Puzzarini, and Barone216 who
employed coupled-cluster theory with full single and dou-
ble substitutions and perturbative triple substitutions,
CCSD(T).
Comparing the total energies of the two stereoisomers
after geometry optimization shows that the total energy
of transplatin is 20 kJ/mol lower, that is, more nega-
tive than that of cisplatin. This means that transplatin
is the energetically more favorable stereoisomer of di-
amminedichloroplatinum(II), [Pt(NH3)2Cl2]. For com-
parison, Liu and Franke217 reported an energy difference
of 56 kJ/mol with a much higher level of theory: rela-
tivistic CCSD(T) employing direct perturbation theory,
a 13s9p7d5f2g contracted Gaussian basis for Pt and aug-
cc-pVQZ for other elements, evaluated on top of molec-
ular geometries optimized for the Becke’88–Perdew’86
functional.218,219 The result from xtb, which we were
able to get in a matter of seconds, is in good qualita-
tive (or even semiquantitative) agreement with the result
obtained with the high level of theory. Next, in Subsec-
tion IVB, we will revisit cisplatin and transplatin with
DFT calculations that afford a step up in accuracy over
xtb.
B. NWChem
NWChem is a program that has been developed for al-
most 30 years. Consequently, a large number of features
are available in the code: HF, DFT, as well as post-HF
calculations, ab initio molecular dynamics, and so on.
NWChem has been designed to run on high-performance
parallel supercomputers as well as on conventional work-
stations. The Supporting Information includes step-by-
step guidelines for installing NWChem and using it to
study the same pharmaceutically relevant cisplatin and
transplatin molecules that were studied with xtb in Sub-
section IVA.
We choose to use non-empirical DFT in the NWChem
examples. Although NWChem also includes more ac-
curate ab initio methods such as coupled-cluster theo-
ries, we shall not consider them in this work since their
proper use requires much more understanding and com-
putational power than DFT does, and as such meth-
ods are typically not included in undergraduate level
courses. We choose the non-empirical PBE0 hybrid
functional85,220,221 (sometimes also known as hybrid PBE
or PBEh) that provides reasonable geometries and ener-
15
getics across the periodic table and shows good perfor-
mance for complexes with d- and f-metals.222,223
Even though DFT is simpler than many post-HF the-
ories, setting up adequate DFT calculations still requires
some considerations. The one-electron basis set is one of
the most important aspects to consider in any electronic
structure calculation in general, such as our attempted
PBE0 calculation with NWChem. The choice of the one-
electron basis set has an immense importance on the com-
putational cost and accuracy of the resulting calculations.
While the GFNn-xTB methods discussed above in Sub-
section IVA did not require the specification of a basis
set, as the basis set is already an essential part of the
specification of the GFNn-xTB methods themselves, the
basis set—which parametrizes the allowed degrees of free-
dom for the movement of the electrons—does need to be
specified for HF, DFT and post-HF calculations.
Because of the profound importance of the choice of
the basis set, various types of Gaussian basis sets have
a long history in quantum chemistry.134 Although many
readers will be familiar with traditional basis sets like
STO-3G,224 3-21G225 and 6-31G*,226 the development
of computer processors and quantum chemical models in
recent decades have also lead to significant advances in
basis set design. Hundreds of Gaussian basis sets in-
tended for various purposes are nowadays available on
the Basis Set Exchange,201 for example.
Because the basis set is an approximation, it is highly
desirable to be able to control its accuracy in order to
make tradeoffs between the cost of the calculation and the
accuracy of the obtained results. Accordingly, modern
basis sets typically come in families of varying size:135,136
the smallest sets enable quick but qualitative calcula-
tions, while the larger sets enable quantitative compu-
tations at the cost of more computer time. In contrast to
traditional basis sets, modern basis set families allow for
a cost-efficient approach to the complete basis set limit,
at which point the error in the one-electron basis set no
longer affects the calculation. Note that also other types
of basis sets than Gaussians may be used for quantum
chemistry, see ref. 172 for further discussion.
In this work, we will only consider the Karlsruhe def2
family of Gaussian basis sets,227 which are a good all-
round choice for general chemistry as they are available
for the whole periodic table up to radon (Z= 86). As
radon is an element of the 6th period, while relativis-
tic effects are already essential for chemistry of the 5th
row,228,229 relativistic effects are described in the def2
basis sets through the use of effective core potentials
(ECPs).230 The ECP is used to describe the chemically
inactive, deep-core electrons only implicitly; this also de-
creases the overall cost of the calculation.
The Karlsruhe def2 sets come in three levels of ac-
curacy. Split-valence (SV) basis sets are the smallest
reasonable basis set for general applications. The def2-
SVP basis is a SV basis set with polarization (P) func-
tions, and is similar in size to the 6-31G** also known
as the 6-31G(d,p) basis set. Like 6-31G**, the def2-
SVP set can also be used without polarization functions
on hydrogen atoms; this basis is called def2-SV(P), it
is smaller than the 6-31G* basis, and it is often use-
ful for quick qualitative/semi-quantitative calculations.
For more quantitative calculations, the def2 series also
contains a triple-ζvalence polarization set (def2-TZVP)
as well as a quadruple-ζvalence polarization set (def2-
QZVP), which typically suffice for achieving the complete
basis set limit in HF and DFT calculations. Calculations
at post-HF levels of theory, however, require larger ba-
sis sets with additional polarization functions; the def2-
TZVPP and def2-QZVPP basis sets exist for this pur-
pose. Diffuse functions (D) are necessary for the proper
description of anions as well as to model e.g. electric
polarizabilities; sets are likewise available at all levels
of accuracy (def2-SVPD, def2-TZVPD, def2-TZVPPD,
def2-QZVPD, def2-QZVPPD) for this purpose.231
For the present demonstration, we choose the def2-
TZVP basis set, as triple-ζbasis sets are well-known
to yield energies that are sufficiently close to the com-
plete basis set limit (see also the applications in Sub-
sections IVC1 and IV C2). Although hybrid func-
tionals are computationally more demanding than non-
hybrid functionals, it is notable that the dispersion-
corrected hybrid PBE0-D4 generalized gradient approx-
imation (GGA) functional was recently shown to out-
perform the dispersion-corrected, meta-GGA-type non-
hybrid r2SCAN-D4 functional in accuracy even for reac-
tion energies of metal–organic reactions.232
Having completed our introduction to DFT calcula-
tions, basis sets, and NWChem, similarly to the workflow
in the case of xtb, the first task is to bring both molecules
into a (local) minimum of the potential energy surface
(PES) by means of geometry optimization. The geome-
try optimization is started from the same hand-built ini-
tial geometries presented in Subsection IVA. In contrast
to xtb, NWChem is capable of employing the point group
symmetry (C2vand C2hfor cisplatin and transplatin, re-
spectively) during the geometry optimization in order to
speed up both the electronic structure calculation as well
as the geometry optimization, and will do so by default.
This means that the calculation runs faster, but also that
the molecule is constrained to the same point group as
the initial geometry during the whole optimization. If
the user is not careful, this may also be a bad thing, as
the use of symmetry may sometimes lead to convergence
to a saddle point instead of a local minimum.
The input required for NWChem is more complicated
than that for xtb. Running NWChem requires setting up
an input file that contains various computational param-
eters in addition to the input geometry. Fully annotated
input files can be found in the SI, a shortened example
is shown in figure 5.
The geometry optimizations of cisplatin and
transplatin finish in a matter of minutes on one
processor core, depending on the used computer. The
optimized Pt–Cl and Pt–N distances for cisplatin are
2.28 Å and 2.08 Å, respectively. These values are in ex-
16
title "Cisplatin"
charge 0
geometry units angstroms autosym 0.1
Pt 0.00000000 -0.00000000 -0.19134710
Cl 0.00000000 1.61220407 1.42085566
Cl 0.00000000 -1.61220407 1.42085566
N 0.00000000 1.40714181 -1.59849021
H 0.81649658 1.30951047 -2.16752575
H -0.81649658 1.30951047 -2.16752575
N 0.00000000 -1.40714181 -1.59849021
H -0.81649658 -1.30951047 -2.16752575
H 0.81649658 -1.30951047 -2.16752575
H 0.00000000 2.30951093 -1.16752621
H 0.00000000 -2.30951093 -1.16752621
end
dft
xc pbe0
mult 1
iterations 100
end
basis spherical
* library def2-tzvp
end
ecp
Pt library def2-ecp
end
driver
maxiter 100
xyz
end
task dft optimize
Figure 5. NWChem example: PBE0/def2-TZVP geometry
optimization of cisplatin; for transplatin, the nuclear coordi-
nates given in figure 4 are used, instead.
cellent agreement with the values of Tasinato, Puzzarini,
and Barone216 that were discussed in Subsection IVA,
that is, Pt–Cl and Pt–N distances of 2.25 Å and 2.06 Å,
respectively: the geometries agree to 0.03 Å.
Next, comparing the total PBE0/def2-TZVP energies
of the two stereoisomers shows that transplatin is 54 kJ/-
mol lower (more negative) than cisplatin. Our DFT value
is in good quantitative agreement with the energy differ-
ence of 56 kJ/mol obtained by Liu and Franke217 using
a high-level CCSD(T) method; however, in contrast to
their CCSD(T) calculations, our DFT calculations can
be performed in a matter of minutes even on a personal
computer.
For cisplatin, we also write out the molecular orbitals
after the geometry has been optimized. The molecular
orbitals provided by from the non-empirical PBE0/def2-
TZVP calculations can now be compared to the ones
from the semiempirical xtb calculations from Subsec-
tion IVA, see figure 6. The frontier orbitals—the highest
occupied molecular orbital (HOMO) as well as the low-
est unoccupied molecular orbital (LUMO)—from the xtb
and NWChem calculations are in good agreement. Also
HOMO3, HOMO2and HOMO1appear similar; the
HOMO2and HOMO1orbitals are merely switched
between the NWChem and xtb calculations. The en-
ergetical ordering of orbitals can easily switch when the
orbitals have similar energies; reorderings of the occupied
orbitals have no effect on the properties of the system.
From the point of view of crystal field theory, the
Pt(II) atom in cisplatin has a square planar coordina-
tion and eight 5delectrons. The four HOMOs and the
LUMO all involve Pt 5dorbitals. In line with crys-
tal field theory, both NWChem and xtb show that the
LUMO involves the Pt 5dx2y2orbital. HOMO3in-
volves the Pt 5dz2orbital, while the 5dxy,5dxz , and
5dyz orbitals contribute to HOMO2, HOMO1, and
HOMO. As is clearly seen from the data presented above,
the non-empirical PBE0/def2-TZVP and the semiempir-
ical GFN2-xTB level of theory provide a similar descrip-
tion of the frontier orbitals of the Pt(II) complex. Again,
the full inputs for the calculations are given in the SI.
C. Psi4
While NWChem represented older and more estab-
lished quantum chemistry codes, Psi4 represents the
newer generation of quantum chemistry codes. The ori-
gins of Psi4 trace to the Psi3 research code written in
C++ for high-accuracy studies on small molecules.79
Compared to Psi3, Psi4 is designed to be a user-friendly,
general-purpose code for fast, automated computations
on molecules with hundreds of atoms.78 Psi4 contains a
number of computational methods ranging from HF and
DFT to post-HF methods such as Møller–Plesset pertur-
bation theory,233 coupled-cluster theory,234 configuration
interaction theory, orbital-optimized correlation meth-
ods, symmetry-adapted perturbation theory, multirefer-
ence methods etc.78 Although the core of the program is
still in C++, Psi4 has thorough Python interfaces and
can be used either as a traditional quantum chemistry
program with input files, or directly from Python.
We will demonstrate the use of Psi4 in the context of
two common exercises in elementary courses on computa-
tional chemistry: a conformational study of methylcyclo-
hexane and the reproduction of the molecular geometry
of the chromyl fluoride (CrO2F2) molecule with special
consideration on the one-electron basis set. We will again
focus on the def2 family of basis sets that was introduced
in Subsection IVB.
1. Methylcyclohexane
Starting out with the conformational study of methyl-
cyclohexane, the workflow is as follows. First, the
molecule is built in a molecular editor such as Avogadro,
IQmol or Jmol, and the drawn molecular structure is pre-
optimized using a force field available in the editor; the
goal of the preoptimization is merely to ensure that the
bond lengths are realistic so that the electronic structure
calculations during the geometry optimization converge
17
Figure 6. The four highest occupied MOs (HOMOs) and the lowest unoccupied MO (LUMO) of cisplatin as obtained from
NWChem (PBE0/def2-TZVP) and xtb (GFN2-xTB). The color code for the nuclei is the same as in figure 2, while red and blue
denote positive and negative orbital amplitudes, respectively (note that the overall sign of the orbital can be freely chosen).
The isovalue used for the orbitals is 0.04 electrons/Bohr3.
without problems, and so that the bonding