Conference PaperPDF Available

Quipu: High-performance simulation of quantum circuits using stabilizer frames

Authors:

Abstract and Figures

As quantum information processing gains traction, its simulation becomes increasingly significant for engineering purposes - evaluation, testing and optimization - as well as for theoretical research. Generic quantum-circuit simulation appears intractable for conventional computers. However, Gottesman and Knill identified an important subclass, called stabilizer circuits, which can be simulated efficiently using group-theory techniques. Practical circuits enriched with quantum error-correcting codes and fault-tolerant procedures are dominated by stabilizer subcircuits and contain a relatively small number of non-stabilizer components. Therefore, we develop new group-theory data structures and algorithms to simulate such circuits. Stabilizer frames offer more compact storage than previous approaches but requires more sophisticated bookkeeping. Our implementation, called Quipu, simulates certain quantum arithmetic circuits (e.g., ripple-carry adders) in polynomial time and space for equal superpositions of n-qubits. On such instances, known linear-algebraic simulation techniques, such as the (state-of-the-art) BDD-based simulator QuIDDPro, take exponential time. We simulate various quantum Fourier transform and quantum fault-tolerant circuits with Quipu, and the results demonstrate that our stabilizer-based technique outperforms QuIDDPro in all cases.
Content may be subject to copyright.
Quipu: High-performance Simulation of
Quantum Circuits using Stabilizer Frames
H´
ector J. Garc´
ıa Igor L. Markov
University of Michigan, EECS, Ann Arbor, MI 48109-2121
{hjgarcia, imarkov}@eecs.umich.edu
Abstract—As quantum information processing gains traction, its sim-
ulation becomes increasingly significant for engineering purposes
evaluation, testing and optimization as well as for theoretical research.
Generic quantum-circuit simulation appears intractable for conventional
computers. However, Gottesman and Knill identified an important
subclass, called stabilizer circuits, which can be simulated efficiently
using group-theory techniques. Practical circuits enriched with quantum
error-correcting codes and fault-tolerant procedures are dominated by
stabilizer subcircuits and contain a relatively small number of non-
stabilizer components. Therefore, we develop new group-theory data
structures and algorithms to simulate such circuits. Stabilizer frames
offer more compact storage than previous approaches but requires
more sophisticated bookkeeping. Our implementation, called Quipu,
simulates certain quantum arithmetic circuits (e.g., ripple-carry adders)
in polynomial time and space for equal superpositions of n-qubits. On
such instances, known linear-algebraic simulation techniques, such as
the (state-of-the-art) BDD-based simulator QuIDDPro, take exponential
time. We simulate various quantum Fourier transform and quantum
fault-tolerant circuits with Quipu, and the results demonstrate that our
stabilizer-based technique outperforms QuIDDPro in all cases.
I. INTRODUCTION
Quantum information processing manipulates quantum states rather
than conventional 0-1bits. It has been demonstrated with a variety
of physical technologies (NMR, ion traps, Josephson junctions in
superconductors, linear and non-linear optics) and used in recently
developed commercial products. Furthermore, it offers a unique
opportunity for EDA research to assist in scientific research. Shor’s
factoring algorithm [17] and Grover’s search algorithm [8] apply
the principles of quantum information to carry out computation
asymptotically more efficiently than conventional computers. These
developments fueled research efforts to design, build and program
scalable quantum computers. Due to the high volatility of quantum
information, quantum error-correcting codes (QECC) and effective
fault-tolerant (FT) architectures are necessary to build reliable quan-
tum computers. Most quantum algorithms are described in terms of
quantum circuits and, just like conventional digital circuits, require
functional simulation to determine the best FT design choices given
limited resources. Simulating quantum circuits on a conventional
computer is a difficult problem. The matrices representing quantum
gates, and the vectors that model quantum states grow exponentially
with an increase in the number of qubits the quantum analogue of
the classical bit. Several software packages have been developed for
quantum-circuit simulation including Oemer’s Quantum Computation
Language (QCL) [13] and Viamontes’ Quantum Information Decision
Diagrams (QuIDD) implemented in the QuIDDPro package [18].
While QCL simulates circuits directly using state vectors, QuIDDPro
uses a variant of binary decision diagrams to store state vectors
more compactly in some cases. Since the state-vector representation
requires excessive computational resources in general, simulation-
based reliability studies (e.g. simulated fault-injection analysis) of
quantum FT architectures using general-purpose simulators has been
limited to small quantum circuits [3]. Therefore, designing fast
simulation techniques that target quantum FT circuits facilitates more
robust reliability analysis of larger quantum circuits.
This work was sponsored in part by the Air Force Research Laboratory under
agreement FA8750-11-2-0043.
Stabilizer circuits and states. Gottesman [7] and Knill identified
an important subclass of quantum circuits, called stabilizer circuits,
which can be simulated efficiently on classical computers. Stabilizer
circuits are exclusively composed of stabilizer gates controlled-
NOT, Hadamard and Phase gates (Figure 1) followed by one-
qubit measurements in the computational basis. Such circuits are
applied to a computational basis state (usually |00...0i) and produce
output states known as stabilizer states. Because of their extensive
applications in QECC and FT architectures, stabilizer circuits have
been studied heavily [1], [7]. Stabilizer circuits can be simulated
in polynomial-time by keeping track of the Pauli operators that
stabilize1the quantum state. Such stabilizer operators are maintained
during simulation and uniquely represent stabilizer states up to
an unobservable global phase.2Therefore, this technique offers an
exponential improvement over the computational resources needed to
simulate stabilize circuits using vector-based representations.
Aaronson and Gottesman [1] proposed an improved technique
that uses a bit-vector representation to simulate stabilizer circuits.
Aaronson implemented this simulation technique in his CHP software
package. Compared to other vector-based simulators (QuIDDPro,
QCL) the technique in [1] does not maintain the global phase of a
state and simulates each stabilizer gate in Θ(n)time using Θ(n2)
space. The overall runtime of CHP is dominated by the number of
measurement gates, which require O(n2)time to simulate.
Stabilizer-based simulation of generic circuits. We propose a
generalization of the stabilizer formalism that admits simulation of
non-stabilizer gates such as Toffoli3gates. This line of research
was first outlined in [1], where the authors describe a stabilizer-
based representation that stores an arbitrary quantum state as a sum
of density-matrix4terms. In contrast, we store arbitrary states as
superpositions5of stabilizer states. Such superpositions are stored
more compactly than the approach from [1], although we do not
handle density matrices. Another key difference is that our approach
explicitly maintains the global phase of each stabilizer state because
in a superposition such phases become relative. We store stabilizer-
state superpositions compactly using our proposed stabilizer frame
data structure. To speed up relevant algorithms, we store generator
sets for each stabilizer frame in row-echelon form to avoid expensive
Gaussian elimination during simulation. The main advantages of
using stabilizer-state superpositions to simulate quantum circuits are:
1An operator Uis said to stabilize a state iff U|ψi=|ψi.
2According to quantum physics, the global phase exp()of a quantum state is
unobservable and does not need to be simulated.
3The Toffoli gate is a 3-bit gate that maps (a,b,c) to (a,b,c(ab)).
4Density matrices are self-adjoint positive-semidefinite matrices of trace 1.0, that
describe the statistical state of a quantum system [11].
5A superposition is a norm-1linear combination of terms.
H=1
21 1
11P=1 0
0iCN OT =
1000
0100
0001
0010
Fig. 1. Stabilizer gates: Hadamard (H), Phase (P), controlled-NOT (CNOT).
(i)Stabilizer subcircuits are simulated with high efficiency.
(ii)Superpositions can be restructured and compressed on the fly
during simulation to reduce resource requirements.
Our stabilizer-based technique simulates certain quantum arithmetic
circuits in polynomial time and space for input states consisting
of unbiased superpositions of computational-basis states. On such
instances, known generic simulation techniques take exponential
time. We simulate various quantum Fourier transform and quantum
FT circuits, and the results demonstrate that our data structure leads
to orders-of-magnitude improvement in runtime and memory as
compared to state-of-the-art simulators.
In the remaining part of this document, we assume a superficial
familiarity with quantum computing, as outlined in [11] and EDA
publications such as [16]. Section II describes key concepts related
to quantum-circuit simulation and the stabilizer formalism. In Sec-
tion III, we introduce stabilizer frames and describe in detail our
simulation flow implemented in Quipu. In Section IV, we discuss
our empirical validation of Quipu and comparisons with state-of-
the-art simulators. Section V closes with concluding remarks.
II. BACKGROU ND A ND PREVIOUS WO RK
Quantum information processes, including quantum algorithms,
are often modeled using quantum circuits and are represented by
diagrams, just like conventional digital circuits [11], [18]. Quantum
circuits are sequences of gate operations that act on some register of
qubits the basic unit of information in a quantum system. A single
qubit is described by a quantum state |ψi, which is a two-dimensional
vector over the complex numbers. In contrast to classical bits, qubits
can be in a superposition of both the 0and 1states. Formally,
|ψi=α0|0i+α1|1i, where |0i= (1,0)>and |1i= (0,1)>are
the two-dimensional computational basis states and αiare probability
amplitudes that satisfy |α0|2+|α1|2= 1. An n-qubit register is the
tensor product of nsingle qubits and thus is modeled by a complex
vector |ψni=|ψ1i ·· · |ψni=P2n1
i=0 αi|bii, where each
biis a binary string representing the value iof each basis state.
Furthermore, |ψnisatisfies P2n1
i=0 |αi|2= 1. Each gate operation
or quantum gate is a unitary matrix that operates on a small subset
of the qubits in a register. For example, the quantum analogue of a
NOT gate is the operator X= ( 0 1
1 0 ),
α0|00i+α1|10iXI
α0|10i+α1|00i
Similarly, the two-qubit CNOT operator flips the second qubit
(target) iff the first qubit (control) is set to 1, e.g.,
α0|00i+α1|10iCN OT
α0|00i+α1|11i
Another operator of particular importance is the Hadamard (H) gate.
This gate is frequently used to put a qubit in a superposition of
computational-basis states, e.g.,
α0|00i+α1|10iIH
(α0|00i+α0|01i+α1|10i+α1|11i)/2
Note that the H gate generates unbiased superpositions in the
sense that the squares of the absolute value of the amplitudes are
equal. The dynamics involved in observing or measuring a quantum
state are described by non-unitary projection operators. There are
different types of quantum measurements, but the one most pertinent
to our discussion are measurements in the computational basis,
i.e., measurements with respect to the |0ior |1ibasis states. The
projection operators for such measurements are P0= ( 1 0
0 0 )and
P1= ( 0 0
0 1 ), respectively. The probability p(x)of obtaining outcome
x {0,1}on the jth qubit of state |ψiis given by the inner product
hψ|Pj
x|ψi, where hψ|is the conjugate transpose of |ψi. For example,
suppose we want to measure |ψi=α0|0i+α1|1iin the |1ibasis:
p(1) = (α
0, α
1)P1(α0, α1)>= (0, α
1)(α0, α1)>=|α1|2
Cofactors of quantum states. The output states obtained after
performing computational-basis measurements are called cofactors,
and are orthogonal states of the form |0i|ψ0iand |1i |ψ1i. We denote
the |0i- and |1i-cofactor by |ψj=0iand |ψj=1 i, respectively, where
jis the index of the measured qubit. One can also consider iterated
cofactors, such as double cofactors |ψqr=00i,|ψq r=01i,|ψq r=10iand
|ψqr=11 i. Cofactoring with respect to all qubits produces amplitudes
of individual basis vectors.
A. Quantum circuits and simulation
To simulate a quantum circuit C, we first initialize the quantum
system to some desired state |ψi(usually a basis state). |ψican be
represented using a fixed-size data structure (e.g., an array of 2n
complex numbers) or a variable-size data structure (e.g., algebraic
decision diagram). We then track the evolution of |ψivia its internal
representation as the gates in Care applied until one obtains the
output state C |ψi[1], [11], [18]. Most quantum-circuit simulators [5],
[12], [13], [18] support some form of the linear-algebraic operations
described earlier. The drawback of such simulators is that their
runtime grows exponentially in the number of qubits. This holds true
not only in the worst case but also in many practical applications
involving arithmetic and FT circuits.
Gottesman developed a simulation method involving the Heisen-
berg model [7] often used by physicists to describe atomic phe-
nomena. In this model, one keeps track of the symmetries of an
object rather than represent the object explicitly. In the context of
quantum-circuit simulation, this model represents quantum states by
their symmetries, rather than complex vectors. The symmetries are
operators for which these states are 1-eigenvectors. Algebraically,
symmetries form group structures, which can be specified compactly
by group generators.
B. The stabilizer formalism
A unitary operator Ustabilizes a state |ψiiff |ψiis a 1–eigenvector
of U, i.e., U|ψi=|ψi. We are interested in operators Uderived
from the Pauli matrices: X= ( 0 1
1 0 ), Y =0i
i0, Z =1 0
01,
and the identity I= ( 1 0
0 1 ). The one-qubit states stabilized by the
Pauli matrices are:
X:(|0i+|1i)/2X:(|0i |1i)/2
Y:(|0i+i|1i)/2Y:(|0i i|1i)/2
Z:|0i Z:|1i
Observe that Istabilizes all states and Idoes not stabilize any
state. Thus, the entangled state (|00i+|11i)/2is stabilized by
the Pauli operators XX,YY,ZZand II. As
shown in Table I, it turns out that the Pauli matrices along with
Iand the multiplicative factors ±1,±i, form a closed group
under matrix multiplication [11]. Formally, the Pauli group Gnon
nqubits consists of the n-fold tensor product of Pauli matrices,
P=ikP1⊗···⊗Pnsuch that Pj {I, X , Y, Z}and k {0,1,2,3}.
For brevity, the tensor-product symbol is often omitted so that P
is denoted by a string of I,X,Yand Zcharacters or Pauli
literals and a separate integer value kfor the phase ik. This string-
integer pair representation allows us to compute the product of Pauli
operators without explicitly computing the tensor products,6e.g.,
(II X I)(iIY II ) = iIY XI . Since | Gn|= 4n+1,Gncan have at
6This holds true due to the identity: (AB)(CD)=(AC BD).
TABLE I
MULTI PL IC ATION TA BL E FOR PAU LI M ATRI CES . SHADED
CELLS INDICATE ANTICOMMUTING PRODUCTS.
I X Y Z
I I X Y Z
X X I iZ iY
Y Y iZ I iX
Z Z iY iX I
most log2| Gn|= log24n+1 = 2(n+1) irredundant generators [11].
The key idea behind the stabilizer formalism is to represent an n-qubit
quantum state |ψiby its stabilizer group S(|ψi) the subgroup of
Gnthat stabilizes |ψi. One can show that, if |S(|ψi)|= 2n, the group
uniquely specifies |ψi. In this case, |ψibelongs to an important class
of quantum states called stabilizer states. Furthermore, S(|ψi)itself
is specified by only log22n=nirredundant stabilizer generators.
Therefore, an arbitrary n-qubit stabilizer state can be represented
by a stabilizer matrix Mwhose rows represent a set of generators
Q1,...,Qnfor S(|ψi). (Hence we use the terms generator set and
stabilizer matrix interchangeably.) Since each Qiis a string of n
Pauli literals, the size of the matrix is n×n. The phases of each
Qiare stored separately using a vector of nintegers. For example,
one can show that |ψi= (|00i+|11i)/2is uniquely specified by
any of the following matrices: M1=+
+XX
ZZ ,M2=+
XX
Y Y ,
M3=
+Y Y
ZZ .One obtains M2from M1by left-multiplying the
second row by the first. Similarly, M3is obtained from M1or M2
via row multiplication. Observe that, multiplying any row by itself
yields II , which stabilizes |ψi. However, II cannot be used as a
generator because it is redundant and carries no information about
the structure of |ψi. The storage cost for Mis Θ(n2), which is an
exponential improvement over the O(2n)cost often encountered in
vector-based representations.
Stabilizer-circuit simulation. The computational basis states are
stabilizer states that can be represented by the stabilizer-matrix
structure depicted in Figure 2-a. In this matrix form, the ±sign of
each row along with its corresponding Zj-literal designates whether
the state of the jth qubit is |0i(+) or |1i(). Suppose we want
to simulate circuit C. Stabilizer-based simulation first initializes M
to specify some basis state. Then, to simulate the action of each
gate U C, we conjugate each row Qiof Mby U.7We require
that UQiUmaps to another string of Pauli literals so that the
resulting matrix M0is well-formed. It turns out that the H, P and
CNOT gates have such mappings, i.e., these gates conjugate the Pauli
group onto itself [7], [11]. Table II lists the mapping for each of
these gates. For example, suppose we simulate a CNOT operation on
|ψi= (|00i+|11i)/2. Using the stabilizer representation, we have
Mψ=+XX
+ZZ C NO T
M0
ψ=+XI
+IZ .One can verify that the
rows of M0
ψstabilize |ψiCN OT
(|00i+|10i)/2as required. Since
H, P and CNOT gates are directly simulated using stabilizers, these
gates are commonly called stabilizer gates and any circuit composed
exclusively of such gates is called a unitary stabilizer circuit. Table II
shows that at most two columns of Mare updated when a stabilizer
gate is simulated. Thus, such gates are simulated in Θ(n)time.
The stabilizer formalism also admits measurements in the com-
putational basis [7]. Conveniently, the formalism avoids the direct
computation of projection operators and inner products (Section II).
Note that any qubit jin a stabilizer state is either in a |0i(|1i)
state or in an unbiased superposition of both. The former case is
called a deterministic outcome and the latter a random outcome.
7Since Qi|ψi=|ψi, the resulting state U|ψiis stabilized by UQiUbecause
(UQiU)U|ψi=U Qi|ψi=U|ψi.
TABLE II
CON JUG ATIO N OF PAUL I-G ROU P EL EM ENT S BY S TABI LI ZE R GATE S [11].
FOR C NOT ,SUBSCRIPT 1I ND IC ATES T HE CO NT ROL A ND 2TH E TAR GET.
GATE INP UT OUT PUT
X Z
H Y -Y
Z X
X Y
P Y -X
Z Z
GATE INP UT OUTPUT
CN OT
I1X2I1X2
X1I2X1X2
I1Y2Z1Y2
Y1I2Y1X2
I1Z2Z1Z2
Z1I2Z1I2
(a) (b)
Fig. 2. (a) Stabilizer-matrix structure for basis states. (b) Row-echelon form
for stabilizer matrices. The X-block contains a minimal set of generators with
X/Yliterals. Generators with Zand Iliterals only appear in the Z-block.
We can tell these cases apart in Θ(n)time by searching for Xor
Yliterals in the jth column of M. If such literals are found, the
qubit must be in a superposition and the outcome is random with
equal probability (p(0) = p(1) = .5); otherwise the outcome is
deterministic (p(0) = 1 or p(1) = 1).
Randomized-outcome case: one flips an unbiased coin to decide the
outcome and then updates Mto make it consistent with the outcome
obtained. Since we might have to examine Min its entirety, the
runtime is O(n2).
Deterministic-outcome case: no updates to Mare necessary but
we need to figure out whether the qubit is in the |0ior |1istate, i.e.,
whether the qubit is stabilized by Zor -Z. One approach is to perform
Gaussian elimination (GE) to put Min row-echelon form. This
removes redundant literals from Mand makes it possible to identify
the row containing a Zin its jth position and I’s everywhere else.
The ±phase of such a row decides the outcome of the measurement.
Since this is a GE-based approach, it takes O(n3)time in practice.
The work in [1] improved the runtime of deterministic measure-
ments by doubling the size of Mto include ndestabilizer generators.
Such destabilizer generators help identify exactly which row multi-
plications to compute in order to decide the measurement outcome.
This approach avoids GE and thus deterministic measurements are
computed in O(n2)time.
III. SIMULATION OF QUAN TU M CIRCUITS
USING STABILIZER FRAMES
The stabilizer gates by themselves do not form a universal set for
quantum computation [1], [11]. However, the Hadamard and Toffoli
(T OF ) gates do [2]. Thus, it suffices to show how to simulate the
Toffoli gate using the stabilizer formalism in order to make our gate
set universal. To accomplish this, we represent arbitrary quantum
states as superpositions of stabilizer states. For example, recall from
Section II-B that the computational basis states are stabilizer states.
Thus, any one-qubit state |ψi=α1|0i+α2|1iis a superposition of
the two stabilizer states |0iand |1i. Observe that, if |ψiis unbiased,
i.e., |α1|2=|α2|2, it can represented using a single stabilizer state
instead of two (up to a global phase). The key idea behind our
technique is to identify and compress large unbiased superpositions
on the fly during simulation to reduce resource requirements.
Stabilizer frames. Suppose |ψiis an n-qubit stabilizer state and we
want to simulate the action of T OFc1c2t, where c1and c2are the
control qubits, and tis the target. First, we decompose |ψiinto all
four of its double cofactors (Section II) over the control qubits,
|ψi= (|ψc1c2=00i+|ψc1c2=01 i+|ψc1c2=10i+|ψc1c2=11 i)/2
which is an unbiased superposition of orthogonal states. Since |ψi
is a stabilizer state and the cofactors are obtained by performing
measurements on |ψi, each |ψc1c2iis computed in O(n2)time
(Section II-B). We compute the action of the Toffoli as,
T OFc1c2t|ψi= ( |ψc1c2=00 i+|ψc1c2=01i
+|ψc1c2=10i+Xt|ψc1c2=11 i)/2
Fig. 3. Simulation of the Toffoli gate using a superposition of stabilizer states.
Amplitudes are omitted for clarity. The Xgate is applied to the third qubit
of the |ψc1c2=11icofactor. The (±)-phase vectors are shown as prepended
columns to the corresponding stabilizer matrices.
where Xtis the Pauli gate (NOT) acting on target t. Each |ψc1c2i
is represented by the same M, but with a different permutation of
leading row phases as shown in Figure 3. Thus, one can represent the
orthogonal stabilizer-state superpositions that arise when simulating
Toffoli gates by a stabilizer frame Fconsisting of (i) a stabilizer
matrix Mand (ii) a set of kdistinct leading (±)-phase vectors.
Each phase vector in the frame represents a distinct state in the
superposition. Additionally, one maintains a vector a= (a1,...,ak)
of the amplitudes associated with the states (phase vectors) in the
superposition, e.g., a= (.5, .5, .5, .5) in Figure 3. Controlled-phase
gates R(α)ct can also be simulated using stabilizer frames. This gate
applies a phase-shift factor of e if both the control qubit cand target
qubit tare set. Thus, we compute the action of R(α)ct as,
R(α)ct |ψi= ( |ψct=00 i+|ψct=01i+|ψct=10 i+e |ψct=11i)/2
Observe that, in contrast to T OF gates, controlled-R(α)gates
produce biased superpositions. The Hadamard and controlled-R(α)
gates are used to implement the quantum Fourier transform circuit,
which plays a key role in Shor’s factoring algorithm.
A. Frame-based Simulation
We now discuss how to manipulate a stabilizer frame Fin order
to simulate generic quantum circuits with both stabilizer and non-
stabilizer gates. To simulate stabilizer gates, we first update the
stabilizer matrix Massociated with Fas per Section II-B. Then,
we iterate over the phase vectors in Fand update each accordingly
(Table II). Thus, this operation takes O(nk)time for a superposition
with kstates. To simulate a non-stabilizer gate, we first update
M(i.e., apply measurements to obtain relevant cofactors). We then
iterate over each phase vector in Fand permute the corresponding
phases in order to generate additional phase vectors corresponding to
the cofactor states. As in the case of stabilizer gates, this operation
is linear in the number of phase vectors. However, by the end of
the operation, the number of phase vectors (states) in Fwill have
grown by a (worst case) factor of four in the case of both T OF
and controlled-R(α). For an arbitrary n-qubit stabilizer frame F,
the number of phase vectors is upper bounded by 2n, the number of
possible ±permutations.
Prior work on simulation of non-stabilizer gates using the stabilizer
formalism can be found in [1] where the authors propose an approach
that represents a quantum state as a sum of O(42d)density-matrix
terms, where dis the number of distinct qubits involved in non-
stabilizer operations.
Global phases of states in F. In quantum mechanics, the states
e |ψiand |ψiare considered phase-equivalent because e does
not affect the statistics of measurement. During stabilizer-based
simulation, such global phases are not maintained. Since these phases
are unobservable, this is not a problem when simulating a single
stabilizer state. However, since we manipulate superpositions of
states, such global phases become relative and cannot be ignored. In
frame-based simulation, we maintain the global phases of the states in
Fusing the amplitude vector a. Let pibe the phase-vector associated
with aia. When simulating gate U, we update each aias follows:
1) Set the leading phases of the rows in Mto pi.
2) Obtain a basis state |bifrom Mand store its amplitude β. If
Uis the Hadamard gate, it may be necessary to sample a sum
of two non-zero basis amplitudes (one real, one imaginary).
3) Compute U(β|bi) = β0|b0ivia the state-vector representation.
4) Obtain |b0ifrom UMUand store its non-zero amplitude γ.
5) Compute the global-phase factor generated as ai= (ai·β0).
To sample the computational-basis amplitudes |biand |b0ifrom the
stabilizer, Mneeds to be in row-echelon form (Figure 2-b). Thus,
each global-phase computation takes O(n3)time for an n-qubit M.
To improve this, we introduce a simulation invariant.
Invariant 1: The stabilizer matrix Massociated with Fremains
in row-echelon form (Figure 2b) during simulation.
Since stabilizer gates affect at most two columns of M, Invariant 1
can be repaired with O(n)row multiplications. Since each row
multiplication takes Θ(n), the runtime required to update Mduring
global-phase maintenance simulation is O(n2). Therefore, for an
arbitrary n-qubit stabilizer frame with kstates, the overall runtime
for simulating a single gate is O(n2+nk)since one can memoize
the updates to Mrequired to compute each ai.
Measuring F. Since the states in Fare orthogonal, the outcome
probability when measuring Fis calculated as the sum of the
normalized outcome probabilities of each state. The normalization is
with respect to the amplitudes stored in aand thus the overall mea-
surement outcome may have a non-uniform distribution. Formally, let
Ψ = Piai|ψiibe the superposition of states represented by F, the
probability of observing outcome x {0,1}upon measuring qubit
mis,
p(x)Ψ=
k
X
i=1 |ai|2hψi|Pm
x|ψii=
k
X
i=1 |ai|2p(x)ψi
where Pm
xdenotes the projection operator in the computational
basis xas discussed in Section II. The outcome probability for
each stabilizer state p(x)ψiis computed as outlined in Section II-B.
Once we compute p(x)Ψ, we flip a (possibly biased) coin to decide
the outcome and update the stabilizer matrix associated with F
(Section II-B). In the worst case, the outcomes of all the states
in Ψare random and each require an O(n2)-time update to M.
(Deterministic measurements do not require updates to Mand, since
we maintain Invariant 1, such measurements can be decided in linear
time.) Thus, measuring a frame with kstates takes O(n2+nk)time.
Multiframe simulation. Although a single frame is sufficient to rep-
resent a stabilizer-state superposition Ψ, one can tame the exponential
growth of states in Ψby admitting a multiframe representation. Such
a representation cuts down the total number of states required to
represent Ψby at least a half, thus improving the scalability of our
technique. Our experiments in Section IV show that, when simulating
ripple-carry adders, the number of states in Ψgrows linearly when
multiframes are used but exponentially when a single frame is used.
One derives a multiframe representation directly from a single
frame Fby examining the set of phase vectors and identifying
candidate pairs that can be coalesced into a single phase vector
associated with a different stabilizer matrix. Since we maintain the
stabilizer matrix Mof a frame in row-echelon form (Invariant 1),
examining the phases corresponding to Zjrows (Z-literal in jth
column and I’s in all other columns) allows us to identify the columns
in Mthat need to be modified in order to coalesce candidate pairs.
Figure 4 shows an example of this process. To obtain M1in the
Figure 4 example, we conjugate the first column of Mby an H
gate. Similarly, to obtain M2we conjugate the first column by H
and then conjugate the first and third columns by CNOT. Thus, the
output of this coalescing process is a list of frames F1,F2,...,Fl
Fig. 4. Example of how a multiframe representation is derived from a single-
frame representation. Each frame Ficonsists of a stabilizer matrix Mi, a set
of (±)-phase vectors and a vector of amplitudes ai.
that together represent the same superposition as the original input
frame. We introduce the following invariant to facilitate simulation
of quantum measurements on multiple frames.
Invariant 2: The stabilizer frames that represent a superposition of
stabilizer states remain mutually orthogonal during simulation, i.e.,
every pair of (basis) vectors from any two frames are orthogonal.
To maintain Invariant 2 we define a specific type of candidate
pair such that the new frames generated from the set of coalesced
phase vectors are mutually orthogonal. Suppose hpr,pjiare a pair
of phase vectors from the same n-qubit frame. Then hpr,pjiis
considered a candidate iff it has the following properties: (i)pr
and pjare equal up to mnentries corresponding to Zk-rows
(where kis the qubit the row stabilizes), and (ii)ar=idajfor some
d {0,1,2,3}(where arand ajare the frame amplitudes paired
with prand pj). The stabilizer circuit needed to coalesce a candidate
pair is defined as C=CNOTv1,v2CNOTv1,v3···CNOTv1,vmPd
v1Hv1,
where the vkdesignate the qubits stabilized by the mdiffering entries
in the candidate pair. The steps in our coalescing procedure are:
1) Sort phase vectors according to differing entries such that
candidate pairs are next to each other.
2) Coalesce candidate pairs into a new set of phase vectors.
3) Create a new frame Ficonsisting of the set of coalesced phase
vectors and the new stabilizer matrix CMC.
4) Repeat steps 2–3 until no candidate pairs remain.
The runtime of this procedure is dominated by Step 1. Each phase-
vector comparison takes Θ(n)time, where nis the size of the phase
vectors. Therefore, the runtime of step 1 and our overall coalescing
procedure is O(nk log k)for a single frame with kphase vectors.
To simulate stabilizer, T OF , controlled-R(α)and measurement
gates using multiple frames, one applies our single-frame algorithms
to each frame in the list independently. In the case of TO F and
controlled-R(α)gates, additional steps are required:
1) Apply the coalescing procedure to each frame and insert the
new “coalesced” frames in the list.
2) Merge frames with equivalent stabilizer matrices.
3) Repeat Steps 1 and 2 until no new frames are generated.
The simulation flow of our technique is shown in Figure 5 and
implemented in our software package Quipu.
IV. EMPIRICAL VALIDATION
We tested a single-threaded version of Quipu on a conventional
Linux server using several benchmark sets consisting of stabilizer
circuits, quantum ripple-carry adders, quantum Fourier transform
circuits and quantum fault-tolerant (FT) circuits.
Stabilizer circuits. We compared the runtime performance of Quipu
against that of CHP using a benchmark set similar to the one used
in [1]. We generated random stabilizer circuits on nqubits, for
n {100,200,...,1500}. The use of randomly generated bench-
marks is justified for our experiments because (i) our algorithms are
not explicitly sensitive to circuit topology and (ii) random stabilizer
Fig. 5. Simulation flow for Quipu.
circuits have been considered representative [9]. For each n, we
generated the circuits as follows: fix a parameter β > 0; then
choose βdnlog2nerandom unitary gates (CNOT, P or H) each with
probability 1/3. Then measure each qubit a {0,...,n 1}in
sequence. We measured the number of seconds needed to simulate
the entire circuit. The entire procedure was repeated for βranging
from 0.6to 1.2in increments of 0.1. Figure 6 shows the average
time needed by Quipu and CHP to simulate this benchmark set. The
purpose of this comparison is to evaluate the overhead of supporting
generic circuit simulation in Quipu. Since CHP is specialized to
stabilizer circuits, we do not expect Quipu to be faster. When
β= 0.6, the simulation time appears to grow roughly linearly
in nfor both simulators. However, when the number of unitary
gates is doubled (β= 1.2), the runtime of both simulators grows
roughly quadratically. Thus, the performance of both CHP and Quipu
depends strongly on the circuit being simulated. Although Quipu is
5×slower than CHP, we note that Quipu maintains global phases
whereas CHP does not. Figure 6 shows that Quipu is asymptotically
as fast as CHP when simulating stabilizer circuits that contain a linear
number of measurements.
Ripple-carry adders. Our second benchmark set consists of n-
bit ripple-carry (Cuccaro) adder [4] circuits, which often appear as
components in many arithmetic circuits [10]. The Cuccaro circuit
for n= 3 is shown in Figure 7. Such circuits act on two n-qubit
input registers, one ancilla qubit and one carry qubit for a total
of 2(n+ 1) qubits. We applied H gates to all 2ninput qubits in
order to simulate addition on a superposition of 22ncomputational-
basis states. Figure 8 shows the average runtime needed to simulate
this benchmark set using Quipu. For comparison, we ran the
same benchmarks on an optimized version of QuIDDPro, called
QPLite8, specific to circuit simulation [18]. When n < 15,QPLite
8QPLite is up to 4×faster since it removes overhead related to QuIDDPros
interpreted front-end for extended quantum programming [15].
Runtime (secs)
0
50
100
150
200
200 400 600 800 1000 1200 1400 1600
CHP
β = .6
β = .7
β = .8
β = .9
β= 1.0
β= 1.1
β = 1.2
0
200
400
600
800
1000
200 400 600 800 1000 1200 1400 1600
Quipu
β = .6
β = .7
β = .8
β = .9
β= 1.0
β= 1.1
β = 1.2
Number of qubits
Fig. 6. Average time needed by Quipu and CHP to simulate an n-
qubit stabilizer circuit with βn log ngates and nmeasurements. Quipu is
asymptotically as fast as CHP but is not limited to stabilizer circuits.
|b0iH  |s0i
|a0iH |a0i
|0i    |0i
|b1iH     |s1i
|a1iH     |a1i
|b2iH   |s2i
|a2iH |a2i
|zi  |zs3i
Fig. 7. Ripple-carry (Cuccaro) adder for 3-bit numbers a=a0a1a2and
b=b0b1b2. The third qubit from the top is an ancilla and the zqubit is the
carry. The b-register is overwritten with the result s0s1s2.
is faster than Quipu because the QuIDD representing the state
vector remains compact during simulation. However, for n > 15,
the compactness of the QuIDD is considerably reduced, and the
majority of QPLite’s runtime is spent in non-local pointer-chasing
and memory (de)allocation. Thus, QPLite fails to scale on such
benchmarks and one observes an exponential increase in runtime.
Furthermore, Quipu consumed 62% less memory than QPLite in
each of these benchmarks.
We ran the same benchmarks using both the single-frame and
multiframe approaches. In the case of a single frame, the number of
states in a superposition grows exponentially in n. However, in the
multiframe approach, the number of states grows linearly in n. This
is because T OF gates produce large equal superpositions that are
effectively compressed by our coalescing technique. Since our frame-
based algorithms require poly(k) time for kstates in a superposition,
Quipu simulates Cuccaro circuits in polynomial time and space for
input states consisting of large superpositions of basis states. On
such instances, known linear-algebraic simulation techniques (e.g.,
QuIDDPro) take exponential time.
The work in [10] describes additional quantum arithmetic circuits
that are based on Cuccaro adders (e.g., subtractors, conditional
adders, comparators). We used Quipu to simulate such circuits and
observed similar runtime performance as that shown in Figure 8.
Quantum Fourier transform (QFT) circuits. Our third benchmark
set consists of circuits for implementing the n-qubit QFT, which
computes the discrete Fourier transform of the amplitudes in the input
quantum state. Let |x1x2. . . xni,xi {0,1}be a computational
basis state and x1,2,...,m =Pm
k=1 xk2k. The action of the QFT on
input state can be expressed as:
|x1...xni=1
2n|0i+e2·xn|1i|0i+e2·xn1,n |1i
· ·· |0i+e2·x1,2,...,n |1i(1)
Avg. runtime (secs)
0
10
20
30
40
50
60
5 10 15 20 25
Quipu
QPLite
0
5
5 10 15 20
zoom
n-bit Cuccaro adder (2n+ 2 qubits)
Fig. 8. Average runtime needed by Quipu and QuIDDPro to simulate n-bit
Cuccaro adders after an equal superposition of allcomputational basis states
is obtained using a block of Hadamard gates (Figure 7). Quipu consumed
62% less memory than QPLite for each of these benchmarks.
|x2i H|y0i
|x1iHR(π/2) |y1i
|x0iHR(π/2) R(π/4) |y2i
Fig. 9. The three-qubit QFT circuit. In general, The first qubit requires one
Hadamard gate, the next qubit requires a Hadamard and a controlled-R(α)
gate, and each following qubit requires an additional controlled-R(α)gate.
Summing up the number of gates gives O(n2)for an n-qubit QFT circuit.
The QFT is used in many quantum algorithms, notably Shor’s
factoring and discrete logarithm algorithms. Such circuits are com-
posed of a network of Hadamard and controlled-R(α)gates, where
α=π/2kand kis the distance over which the gate acts. The three-
qubit QFT circuit is shown in Figure 9. Figure 10 shows average
runtime and memory usage for both Quipu and QPLite on QFT
instances for n={10,12,...,20}.Quipu runs approximately
4×faster than QPLite on average and consumes about 90% less
memory. For these benchmarks, we observed that the number of
states in our multiframe data structure was 2n1. This is because
controlled-R(α)gates produce biased superpositions (Section III-A)
that cannot be effectively compressed using our coalescing procedure.
Therefore, as Figure 10 shows, the runtime and memory requirements
of both Quipu and QPLite grow exponentially in nfor QFT
instances. However, Quipu scales to 22-qubit instances whereas
QPLite scales to only 18 qubits.
Fault-tolerant (FT) circuits. Our last benchmark set consists of
circuits that, in addition to preparing encoded quantum states, im-
plement procedures for performing FT quantum operations [6], [11],
[14]. FT operations limit the propagation errors from one qubit
in a QECC-register (the block of qubits that encodes a logical
qubit) to another qubit in the same register, and a single faulty
gate damages at most one qubit in each register. One constructs
FT stabilizer circuits by executing each stabilizer gate transversally9
across QECC-registers [7], [11], [14]. Non-stabilizer gates need to
be implemented using a FT architecture that often requires additional
ancilla qubits, measurements and correction procedures conditioned
on measurement outcomes. Figure 11 shows a circuit that implements
a FT-Toffoli operation [14]. Each line in Figure 11 represents a 5-
qubit register implementing the DiVincenzo/Shor code.
We implemented FT benchmarks for the half-adder and full-adder
circuits as well as for computing f(x) = bxmod 15. Each circuit from
Figure 12 implements f(x)with a particular co-prime base value b
as a (2,4) look-up table (LUT).10 The Toffoli gates in all our FT
benchmarks are implemented using the FT architecture from Figure
11. Since FT-Toffoli operations require 6ancilla registers, a circuit
9In a transversal operation, the ith qubit in each QECC-register interacts only with
the ith qubit of other QECC-registers.
10A(k, m)-LUT takes kread-only input bits and m > log2kancilla bits. For
each 2kinput combination, an LUT produces a pre-determined m-bit value, e.g., a
(2,4)-LUT is defined by values (1,2,4,8) or (1,4,1,4).
Avg. runtime (secs)
0
100
200
300
400
500
12 14 16 18 20 22
Quipu
QPLite
Peak memory (MB)
0
100
200
300
400
500
600
700
12 14 16 18 20 22
Quipu
QPLite
n-qubit QFT circuit n-qubit QFT circuit
Fig. 10. Average runtime and memory needed by Quipu and QuIDDPro
to simulate n-qubit QFT circuits, which contain n(n+ 1)/2gates.
TABLE III
AVERA GE TI ME A ND M EMO RY NE ED ED B Y QUIPU AND QPL ITE TO SI MU LATE O UR B ENC HM AR K SE T OF QU ANT UM F T CIRCUITS.
THE SECOND COLUMN INDICATES THE QECC USED TO E NC OD E kLOGICAL QUBITS INTO nPHYSICAL QUBITS. WE USE D TH E
3-QUB IT B IT-FLIP CODE FOR LARGER BENCHMARKS AND THE 5-QUBI T DIVINCENZO/S HOR CODE [6] FO R SM ALL ER O NE S ().
FAULT-TOL ER AN T QE CC T OTAL Q UBITS N UM.O F GATE S RU NT IM E (SE CS )MEMORY (MB) MAX SIZE(Ψ)
CIRCUIT [n, k](INC.ANCILLA)STA B.TO FF.QPLite Quipu QPLite Quipu SI NG LE FMULTI F
toffoli[15,3] 45 155 15 43.68 0.20 98.45 12.76 2816 32
halfadd[15,3] 45 160 15 43.80 0.20 94.82 12.76 2816 32
fulladd[20,4] 80 320 30 84.96 0.88 91.86 12.94 2816 32
2xmod15 [18,6] 81 396 36 4.81hrs 1.48 11.85 12.96 22528 64
4xmod15[30,6] 30 30 0 0.01 <0.01 6.14 12.01 1 1
7xmod15 [18,6] 81 402 36 11.25hrs 1.52 12.41 13.29 22528 64
8xmod15 [18,6] 81 399 36 11.37hrs 1.52 12.48 13.29 22528 64
11xmod15[30,6] 30 25 0 0.02 <0.01 6.14 12.01 1 1
13xmod15 [18,6] 81 399 36 11.28hrs 1.56 11.85 12.25 22528 64
14xmod15[30,6] 30 40 0 0.02 <0.01 6.14 12.01 1 1
that implements tFT-Toffolis using a k-qubit QECC, requires 6tk
ancilla qubits. Therefore, to compare with QPLite, we used the 3-
qubit bit-flip code [11, Ch. 10] instead of the more robust 5-qubit code
in our larger benchmarks. Our results in Table III show that Quipu
is typically faster than QPLite by several orders of magnitude and
consumes 8×less memory for the toffoli,half-adder and full-adder
benchmarks. Table III also shows that our coalescing technique is
effective as the maximum size of the stabilizer-state superposition is
orders-of-magnitude smaller when multiple frames are used.
V. CONCLUSIONS AND FUTURE WORK
In this work, we developed new techniques for quantum-circuit
simulation based on superpositions of stabilizer states, and managed
to circumvent shortcomings in prior work [1]. We implement our
algorithms in our software package Quipu. Current simulators based
on the stabilizer formalism, such as CHP, are limited to simulation
of stabilizer circuits. Our results show that Quipu performs asymp-
totically as fast as CHP on stabilizer circuits with a linear number of
measurement gates. Our stabilizer-based technique simulates certain
quantum arithmetic circuits in polynomial time and space for input
states consisting of unbiased superpositions of computational-basis
states. QuIDDPro takes exponential time on such instances. We
simulated various quantum Fourier transform and quantum fault-
tolerant circuits with Quipu, and the results demonstrate that our
stabilizer-based technique leads to orders-of-magnitude improvement
in runtime and memory as compared to QuIDDPro. While our
technique uses more sophisticated mathematics and quantum-state
modeling, it is significantly easier to implement and optimize.
|0i
|0i
|0i
|cati
|cati
|cati
|xi
|yi
|zi
H
H
H
H
H
H
r
r
e
e
r
r
r
e
e
r
r
r
e
e
r
e
r
e
r
r
e
r
e
r
e
H
Meas.
Meas.
Meas.
Meas.
Meas.
Meas.
r
Z
Z
6
r
e
e
6
e
r
e
6
|xi
|yi
|zxyi
Fig. 11. Fault-tolerant implementation of a Toffoli gate. Each line represents
a5-qubit register and each gate is applied transversally. The state |cati=
(
05+
15)/2is obtained using a stabilizer subcircuit (not shown). The
arrows point to the set of gates that is applied if the measurement outcome
is 1; no action is taken otherwise. Controlled-Zgates are implemented as
HjCN OTi,j Hjwith control iand target j.Zgates are implemented as P2.
|x0iH  H H  H  |x0i
|x1iH  H H   H  |x1i
|0i     |y0i
|0i   |y1i
|0i     |y2i
|0i   |y3i
b= 2 b= 4 b= 7 b= 8
Fig. 12. Mod-exp with M= 15 implemented as (2,4)-LUTs [10] for several
co-prime base values. Negative controls are shown with hollow circles. We
apply Hadamards to each x-qubit to generate a superposition of all the input
values for x. Our benchmarks implement these computations using the 3-qubit
bit-flip code [11, Ch. 10] and the FT-Toffoli architecture from Figure 11.
REFERENCES
[1] S. Aaronson, D. Gottesman, “Improved Simulation of Stabilizer Cir-
cuits,” Phys. Rev. A, vol. 70, no. 052328 (2004).
[2] D. Aharonov. A Simple Proof that Toffoli and Hadamard are Quantum
Universal, arXiv:quant-ph/0301040 (2003).
[3] O. Boncalo et al., “Using Simulated Fault Injection for Fault Tolerance
Assessment of Quantum Circuits,” Proc. Sim. Symp., pp.213-220 (2007).
[4] S. A. Cuccaro et al., “A New Quantum Ripple-carry Addition Circuit,
arXiv:quant-ph/0410184v1 (2004).
[5] K. De Raedt et al., “Massively Parallel Quantum Computer Simulator”,
Comp. Phys. Comm., vol. 176, no. 2, pp. 121–136 (2007).
[6] D. P. DiVincenzo, P. W. Shor, “Fault-Tolerant Error Correction with
Efficient Quantum Codes”, Phys. Rev. Lett., vol. 77, no. 3260 (1996).
[7] D. Gottesman, “The Heisenberg Representation of Quantum Computers,”
arXiv:9807006v1 (1998).
[8] L. Grover, “A Fast Quantum Mechanical Algorithm for Database
Search,” Symp. on Theory of Comp., pp. 212-219 (1996).
[9] E. Knill et al., “Randomized Benchmarking of Quantum Gates,” Phys.
Rev. A, vol. 77, no. 1 (2007).
[10] I. L. Markov, M. Saeedi, “Constant-optimized Quantum circuits for
Modular Multiplication and Exponentiation,” Quant. Info. and Comp.,
vol. 12, no. 5 (2012).
[11] M. A. Nielsen, I. L. Chuang, Quantum Computation and Quantum
Information, Cambridge University Press (2000).
[12] K. M. Obenland, A. M. Despain, A Parallel Quantum Computer
Simulator, arXiv:quant-ph/9804039 (1998).
[13] B. Oemer (2003), http://tph.tuwien.ac.at/oemer/qcl.html.
[14] J. Preskill, “Fault Tolerant Quantum Computation, Introduction to
Quantum Computation, World Scientific (1998). quant-ph/9712048.
[15] http://vlsicad.eecs.umich.edu/Quantum/qp/
[16] V. V. Shende, S. S. Bullock, I. L. Markov, “Synthesis of quantum logic
circuits,” IEEE Trans. on CAD, vol. 25, no. 6 (2006).
[17] P. Shor, “Polynomial-time Algorithms for Prime Factorization and Dis-
crete Logarithms on a Quantum Computer, SIAM J. Comput, vol. 26,
no. 5 (1997).
[18] G. F. Viamontes, I. L. Markov, J. P. Hayes, Quantum Circuit Simulation,
Springer (2009).
... The Gottesman-Knill theorem [85] states that a Clifford circuit, built from the gate from the set S H CNOT , , { } acting on computational basis states and measurements in the computational basis, can be efficiently simulated on a classical computer. This result has since been greatly extended and improved [86][87][88][89][90]. While the Clifford gate set is not universal even for classical computations [86], adding just the T-gate to the set makes it universal for quantum computation. ...
... This makes the simulator of [74] perfect for our purposes, and others mentioned above less useful. There are many implementations of simulators available [94] but they are either more general purposes solutions [90,[95][96][97][98][99][100][101][102][103], which can mean a large overhead for our specific set of circuits, or bespoke for tasks other than the one we require here [104][105][106]. ...
Article
Full-text available
As research on building scalable quantum computers advances, it is important to be able to certify their correctness. Due to the exponential hardness of classically simulating quantum computation, straight-forward verification via this means fails. However, we can classically simulate small scale quantum computations and hence we are able to test that devices behave as expected in this domain. This constitutes the first step towards obtaining confidence in the anticipated quantum-advantage when we extend to scales that can no longer be simulated. Real devices have restrictions due to their architecture and limitations due to physical imperfections and noise. In this paper we extend the usual ideal simulations by considering those effects. We aim to provide a general methodology and framework for constructing simulations which emulate the physical system. These simulations should provide a benchmark for realistic devices and guide experimental research in the quest for quantum-advantage. To illustrate our methodology we give examples that involve networked architectures and the noise-model of the device developed by the Networked Quantum Information Technologies Hub (NQIT). For our simulations we use, with suitable modification, the classical simulator of Bravyi and Gosset while the specific problems considered belong to the Instantaneous Quantum Polynomial-time class. This class is believed to be hard for classical computational devices, and is regarded as a promising candidate for the first demonstration of quantum-advantage. We first consider a subclass of IQP, defined by Bermejo-Vega et al, involving two-dimensional dynamical quantum simulators, and then general instances of IQP, restricted to the architecture of NQIT.
... We leveraged our theoretical analysis to develop several data structures including: stabilizer frames, multiframes and p-blocked multiframes. We describe the design of practical software methods that implement these data structures and algorithms to facilitate simulation of larger sets of quantum circuits on conventional computers [28,29]. However, before we take a closer look at the Heisenberg representation for quantum computers in Chapter II, it is instructive to first review background information on quantum computation. ...
... validated their performance. Recall that the runtime of Algorithm 4.1.1 is dominated by the two nested for-loops (lines[20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35]. The number of times these loops execute depends on the amount of entanglement in the input stabilizer state. ...
Article
Simulation of quantum information processing remains a major challenge with important applications in quantum computer science and engineering. Generic quantum-circuit simulation appears intractable for conventional computers and may be unnecessary because useful quantum circuits exhibit significant structure that can be exploited during simulation. For example, Gottesman and Knill identified an important subclass, called stabilizer circuits, which can be simulated efficiently using the Heisenberg representation for quantum computers. Stabilizer circuits are exclusively composed of stabilizer gates -- Hadamard, Phase and CNOT -- followed by one-qubit measurements in the computational basis. Such circuits are applied to a computational-basis state and produce so-called stabilizer states. Aaronson and Gottesman generalized stabilizer-circuit simulation to additionally handle a small number of non-stabilizer gates. We design new, more efficient data structures and algorithms for such beyond-stabilizer simulation using superpositions of stabilizer states. One such data structure, a stabilizer frame, offers more compact storage than previous approaches but require additional algorithms to maintain the global phases of each state in the superposition. To explore the advantages and limitations of our technique, we analyze the geometric structure of stabilizer states and their embedding in Hilbert space. Our analysis includes results on the computational geometry of stabilizer states such as efficient algorithms for computing distances, angles and volumes between them. The main advantages of using stabilizer-state superpositions to simulate quantum circuits are: (i) stabilizer subcircuits are simulated with high efficiency, (ii) superpositions can be restructured and compressed on the fly during simulation to reduce resource requirements, and (iii) operations performed on such superpositions lend themselves to distributed or asynchronous processing. Our software implementation, called Quipu, simulates certain quantum arithmetic circuits (e.g., reversible ripple-carry adders) and quantum Fourier transform circuits in polynomial time and space for specific input states. On such instances, known linear-algebraic simulation techniques, such as the (state-of-the-art) BDD-based simulator QuIDDPro, take exponential time. We simulate quantum fault-tolerant circuits using Quipu, and the results indicate that our stabilizer-based technique empirically outperforms QuIDDPro in all cases. While previous structure-aware simulations of quantum circuits were difficult to parallellize, we demonstrate a parallel version of Quipu that achieves a nontrivial speedup.
... |A 0 is stabilised by (X +Y )/ √ 2). Computing the truth table, having non-stabiliser inputs, results in an exponential overhead which is difficult to mitigate [12]. ...
Preprint
Quantum computations are expressed in general as quantum circuits, which are specified by ordered lists of quantum gates. The resulting specifications are used during the optimisation and execution of the expressed computations. However, the specification format makes it difficult to verify that optimised or executed computations still conform to the initial gate list specifications: showing the computational equivalence between two quantum circuits expressed by different lists of quantum gates is exponentially complex in the worst case. In order to solve this issue, this work presents a derivation of the specification format tailored specifically for fault-tolerant quantum circuits. The circuits are considered a form consisting entirely of single qubit initialisations, CNOT gates and single qubit measurements (ICM form). This format allows, under certain assumptions, to efficiently verify optimised (or implemented) computations. Two verification methods based on checking stabiliser circuit structures are presented.
... On the other hand, for a wide class of circuits with restricted gate sets and input states [20][21][22][23][24], efficient classical simulation algorithms are available. For example, the numerical package Quipu [25,26] has been developed for taking advantage of prior results [20,21,24] on the stabilizer formalism to speed up general quantum circuit simulation. Finally, path integral-based methods [27] have also been proposed-though they do not improve the simulation cost, they lead to reduced memory storage requirements. ...
Preprint
Classical simulation of quantum computation is necessary for studying the numerical behavior of quantum algorithms, as there does not yet exist a large viable quantum computer on which to perform numerical tests. Tensor network (TN) contraction is an algorithmic method that can efficiently simulate some quantum circuits, often greatly reducing the computational cost over methods that simulate the full Hilbert space. In this study we implement a tensor network contraction program for simulating quantum circuits using multi-core compute nodes. We show simulation results for the Max-Cut problem on 3- through 7-regular graphs using the quantum approximate optimization algorithm (QAOA), successfully simulating up to 100 qubits. We test two different methods for generating the ordering of tensor index contractions: one is based on the tree decomposition of the line graph, while the other generates ordering using a straight-forward stochastic scheme. Through studying instances of QAOA circuits, we show the expected result that as the treewidth of the quantum circuit's line graph decreases, TN contraction becomes significantly more efficient than simulating the whole Hilbert space. The results in this work suggest that tensor contraction methods are superior only when simulating Max-Cut/QAOA with graphs of regularities approximately five and below. Insight into this point of equal computational cost helps one determine which simulation method will be more efficient for a given quantum circuit. The stochastic contraction method outperforms the line graph based method only when the time to calculate a reasonable tree decomposition is prohibitively expensive. Finally, we release our software package, qTorch (Quantum TensOR Contraction Handler), intended for general quantum circuit simulation.
... However, when 0 n | C |0 n = 0 we show that its absolute value is computable quickly from r alone after step 2. Graph-state circuits and the larger but equivalent class of stabilizer (aka. Clifford) circuits are commonly quoted as simulatable in O(n 2 ) time but this applies only with a bounded number of single-qubit measurements [AG04,AB06] (see also [GM13,GMC14,GM15]). Computing the probability p = | 0 n | C |0 n | 2 is classed as a form of strong simulation by [JvdN14] and is representative of the tasks designated STR(n) in [Koh17] for standard-basis inputs. ...
Preprint
We show that a form of strong simulation for n-qubit quantum stabilizer circuits C is computable in O(s+nω)O(s + n^\omega) time, where ω\omega is the exponent of matrix multiplication. Solution counting for quadratic forms over F2\mathbb{F}_2 is also placed into O(nω)O(n^\omega) time. This improves previous O(n3)O(n^3) bounds. Our methods in fact show an O(n2)O(n^2)-time reduction from matrix rank over F2\mathbb{F}_2 to computing p=  0n    C    0n  2p = |\langle \; 0^n \;|\; C \;|\; 0^n \;\rangle|^2 (hence also to solution counting) and a converse reduction that is O(s+n2)O(s + n^2) except for matrix multiplications used to decide whether p>0p > 0. The current best-known worst-case time for matrix rank is O(nω)O(n^{\omega}) over F2\mathbb{F}_2, indeed over any field, while ω\omega is currently upper-bounded by 2.37282.3728\dots Our methods draw on properties of classical quadratic forms over Z4\mathbb{Z}_4. We study possible distributions of Feynman paths in the circuits and prove that the differences in +1 vs. 1-1 counts and +i vs. i-i counts are always 0 or a power of 2. Further properties of quantum graph states and connections to graph theory are discussed.
... |A 0 is stabilised by (X +Y )/ √ 2). Computing the truth table, having non-stabiliser inputs, results in an exponential overhead which is difficult to mitigate [10]. ...
Article
Quantum computations are expressed in general as quantum circuits, which are specified by ordered lists of quantum gates. The resulting specifications are used during the optimisation and execution of the expressed computations. However, the specification format makes it is difficult to verify that optimised or executed computations still conform to the initial gate list specifications: showing the computational equivalence between two quantum circuits expressed by different lists of quantum gates is exponentially complex in the worst case. In order to solve this issue, this work presents a derivation of the specification format tailored specifically for fault-tolerant quantum circuits. The circuits are considered a form consisting entirely of single qubit initialisations, CNOT gates and single qubit measurements (ICM form). This format allows, under certain assumptions, to efficiently verify optimised (or implemented) computations. Two verification methods based on checking stabiliser circuit structures are presented.
... On the other hand, for a wide class of circuits with restricted gate sets and input states [20][21][22][23][24], efficient classical simulation algorithms are available. For example, the numerical package Quipu [25,26] has been developed for taking advantage of prior results [20,21,24] on the stabilizer formalism to speed up general quantum circuit simulation. Finally, path integral-based methods [27] have also been proposed-though they do not improve the simulation cost, they lead to reduced memory storage requirements. ...
Article
Full-text available
Classical simulation of quantum computation is necessary for studying the numerical behavior of quantum algorithms, as there does not yet exist a large viable quantum computer on which to perform numerical tests. Tensor network (TN) contraction is an algorithmic method that can efficiently simulate some quantum circuits, often greatly reducing the computational cost over methods that simulate the full Hilbert space. In this study we implement a tensor network contraction program for simulating quantum circuits using multi-core compute nodes. We show simulation results for the Max-Cut problem on 3- through 7-regular graphs using the quantum approximate optimization algorithm (QAOA), successfully simulating up to 100 qubits. We test two different methods for generating the ordering of tensor index contractions: one is based on the tree decomposition of the line graph, while the other generates ordering using a straight-forward stochastic scheme. Through studying instances of QAOA circuits, we show the expected result that as the treewidth of the quantum circuit’s line graph decreases, TN contraction becomes significantly more efficient than simulating the whole Hilbert space. The results in this work suggest that tensor contraction methods are superior only when simulating Max-Cut/QAOA with graphs of regularities approximately five and below. Insight into this point of equal computational cost helps one determine which simulation method will be more efficient for a given quantum circuit. The stochastic contraction method outperforms the line graph based method only when the time to calculate a reasonable tree decomposition is prohibitively expensive. Finally, we release our software package, qTorch (Quantum TensOR Contraction Handler), intended for general quantum circuit simulation. For a nontrivial subset of these quantum circuits, 50 to 100 qubits can easily be simulated on a single compute node.
... This exponential resource requirement hardens the simulation of large quantum circuit on classical computer. A set of interesting methods of simulating general quantum circuit and related tools can be found in Viamontes et al. (2009); Garcia and Markov (2013); Viamontes et al. (2003); Gottesman (1998a); Aaronson and Gottesman (2004). In contrast the performance simulation described above, evade this problem by only keeping track of variables of interest. ...
Research
Full-text available
A study which explores optimized novel designs for large scale quantum computer based on realistic noisy hardware
... Efficient simulation offerings could be extended to include methods in Refs. [10,27]. ...
Article
Languages, compilers, and computer-aided design tools will be essential for scalable quantum computing, which promises an exponential leap in our ability to execute complex tasks. LIQUi|> is a modular software architecture designed to control quantum hardware. It enables easy programming, compilation, and simulation of quantum algorithms and circuits, and is independent of a specific quantum architecture. LIQUi|> contains an embedded, domain-specific language designed for programming quantum algorithms, with F# as the host language. It also allows the extraction of a circuit data structure that can be used for optimization, rendering, or translation. The circuit can also be exported to external hardware and software environments. Two different simulation environments are available to the user which allow a trade-off between number of qubits and class of operations. LIQUi|> has been implemented on a wide range of runtimes as back-ends with a single user front-end. We describe the significant components of the design architecture and how to express any given quantum algorithm.
Article
Full-text available
We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Article
Full-text available
Reversible circuits for modular multiplication CxCx%M with x<Mx<M arise as components of modular exponentiation in Shor's quantum number-factoring algorithm. However, existing generic constructions focus on asymptotic gate count and circuit depth rather than actual values, producing fairly large circuits not optimized for specific C and M values. In this work, we develop such optimizations in a bottom-up fashion, starting with most convenient C values. When zero-initialized ancilla registers are available, we reduce the search for compact circuits to a shortest-path problem. Some of our modular-multiplication circuits are asymptotically smaller than previous constructions, but worst-case bounds and average sizes remain Θ(n2)\Theta(n^2). In the context of modular exponentiation, we offer several constant-factor improvements, as well as an improvement by a constant additive term that is significant for few-qubit circuits arising in ongoing laboratory experiments with Shor's algorithm.
Article
Full-text available
The pressure of fundamental limits on classical computation and the promise of exponential speedups from quantum effects have recently brought quantum circuits (Proc. R. Soc. Lond. A, Math. Phys. Sci., vol. 425, p. 73, 1989) to the attention of the electronic design automation community (Proc. 40th ACM/IEEE Design Automation Conf., 2003), (Phys. Rev. A, At. Mol. Opt. Phy., vol. 68, p. 012318, 2003), (Proc. 41st Design Automation Conf., 2004), (Proc. 39th Design Automation Conf., 2002), (Proc. Design, Automation, and Test Eur., 2004), (Phys. Rev. A, At. Mol. Opt. Phy., vol. 69, p. 062321, 2004), (IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 22, p. 710, 2003). Efficient quantum-logic circuits that perform two tasks are discussed: 1) implementing generic quantum computations, and 2) initializing quantum registers. In contrast to conventional computing, the latter task is nontrivial because the state space of an n-qubit register is not finite and contains exponential superpositions of classical bitstrings. The proposed circuits are asymptotically optimal for respective tasks and improve earlier published results by at least a factor of 2. The circuits for generic quantum computation constructed by the algorithms are the most efficient known today in terms of the number of most expensive gates [quantum controlled-NOTs (CNOTs)]. They are based on an analog of the Shannon decomposition of Boolean functions and a new circuit block, called quantum multiplexor (QMUX), which generalizes several known constructions. A theoretical lower bound implies that the circuits cannot be improved by more than a factor of 2. It is additionally shown how to accommodate the severe architectural limitation of using only nearest neighbor gates, which is representative of current implementation technologies. This increases the number of gates by almost an order of magnitude, but preserves the asymptotic optimality of gate counts
Article
A digital computer is generally believed to be an efficient universal computing device; that is, it is believed able to simulate any physical computing device with an increase in computation time by at most a polynomial factor. This may not be true when quantum mechanics is taken into consideration. This paper considers factoring integers and finding discrete logarithms, two problems which are generally thought to be hard on a classical computer and which have been used as the basis of several proposed cryptosystems. Efficient randomized algorithms are given for these two problems on a hypothetical quantum computer. These algorithms take a number of steps polynomial in the input size, e.g., the number of digits of the integer to be factored.
Conference Paper
This paper addresses the problem of evaluating the fault tolerance algorithms and methodologies (FTAMs) designed for quantum systems, by adopting the simulated fault injection methodology from classical computation. Due to their wide spectrum of applications (including quantum circuit simulation) and hierarchical features, the HDLs were employed for performing fault injection, as prescribed by the guidelines of the QUERIST project. At the same time, the injection techniques taken from classical circuit simulation had to be adapted to quantum computation requirements, including the specific quantum error models. The experimental simulated fault injection campaigns are thoroughly described along with the experimental results, which confirm the analytical expectations
Book
Recent progress in atomic physics, semiconductors, and optical technologies lead to the need to control matter at an unprecedented scale. However, atoms, electrons and photons do not obey laws of classical physics and instead are governed by quantum mechanics. The formalism of quantum circuits promises to transform engineering disciplines the way digital circuits transformed computing, communications, control and measurement. A quantum circuit simulator implemented in software acts as a replacement of an actual quantum system and seeks to calculate the output from the inputs. This is a very difficult task, but researchers have achieved significant progress in many important special cases. This self-contained book discusses both theoretical and practical aspects of simulating quantum circuits on conventional computers. Engineers can sanity-check and evaluate their designs through simulation before building hardware. Computer scientists can use simulation to compare quantum algorithms to conventional ones. Quantum Circuit Simulation covers the fundamentals of linear algebra and introduces basic concepts of quantum physics needed to understand quantum circuits and algorithms. It requires only basic familiarity with algebra, graph algorithms and computer engineering. After introducing necessary background, the authors describe key simulation techniques that have so far been scattered throughout the research literature in physics, computer science, and computer engineering. Quantum Circuit Simulation also illustrates the development of software for quantum simulation by example of the QuIDDPro package, which is freely available and can be used by students of quantum information as a "quantum calculator."
Book
Part I. Fundamental Concepts: 1. Introduction and overview; 2. Introduction to quantum mechanics; 3. Introduction to computer science; Part II. Quantum Computation: 4. Quantum circuits; 5. The quantum Fourier transform and its application; 6. Quantum search algorithms; 7. Quantum computers: physical realization; Part III. Quantum Information: 8. Quantum noise and quantum operations; 9. Distance measures for quantum information; 10. Quantum error-correction; 11. Entropy and information; 12. Quantum information theory; Appendices; References; Index.
Article
We exhibit a simple, systematic procedure for detecting and correcting errors using any of the recently reported quantum error-correcting codes. The procedure is shown explicitly for a code in which one qubit is mapped into five. The quantum networks obtained are fault tolerant, that is, they can function successfully even if errors occur during the error correction. Our construction is derived using a recently introduced group-theoretic framework for unifying all known quantum codes. Comment: 12 pages REVTeX, 1 ps figure included. Minor additions and revisions