Content uploaded by Igor L. Markov

Author content

All content in this area was uploaded by Igor L. Markov on Sep 10, 2014

Content may be subject to copyright.

Quipu: High-performance Simulation of

Quantum Circuits using Stabilizer Frames

H´

ector J. Garc´

ıa Igor L. Markov

University of Michigan, EECS, Ann Arbor, MI 48109-2121

{hjgarcia, imarkov}@eecs.umich.edu

Abstract—As quantum information processing gains traction, its sim-

ulation becomes increasingly signiﬁcant for engineering purposes –

evaluation, testing and optimization – as well as for theoretical research.

Generic quantum-circuit simulation appears intractable for conventional

computers. However, Gottesman and Knill identiﬁed an important

subclass, called stabilizer circuits, which can be simulated efﬁciently

using group-theory techniques. Practical circuits enriched with quantum

error-correcting codes and fault-tolerant procedures are dominated by

stabilizer subcircuits and contain a relatively small number of non-

stabilizer components. Therefore, we develop new group-theory data

structures and algorithms to simulate such circuits. Stabilizer frames

offer more compact storage than previous approaches but requires

more sophisticated bookkeeping. Our implementation, called Quipu,

simulates certain quantum arithmetic circuits (e.g., ripple-carry adders)

in polynomial time and space for equal superpositions of n-qubits. On

such instances, known linear-algebraic simulation techniques, such as

the (state-of-the-art) BDD-based simulator QuIDDPro, take exponential

time. We simulate various quantum Fourier transform and quantum

fault-tolerant circuits with Quipu, and the results demonstrate that our

stabilizer-based technique outperforms QuIDDPro in all cases.

I. INTRODUCTION

Quantum information processing manipulates quantum states rather

than conventional 0-1bits. It has been demonstrated with a variety

of physical technologies (NMR, ion traps, Josephson junctions in

superconductors, linear and non-linear optics) and used in recently

developed commercial products. Furthermore, it offers a unique

opportunity for EDA research to assist in scientiﬁc research. Shor’s

factoring algorithm [17] and Grover’s search algorithm [8] apply

the principles of quantum information to carry out computation

asymptotically more efﬁciently than conventional computers. These

developments fueled research efforts to design, build and program

scalable quantum computers. Due to the high volatility of quantum

information, quantum error-correcting codes (QECC) and effective

fault-tolerant (FT) architectures are necessary to build reliable quan-

tum computers. Most quantum algorithms are described in terms of

quantum circuits and, just like conventional digital circuits, require

functional simulation to determine the best FT design choices given

limited resources. Simulating quantum circuits on a conventional

computer is a difﬁcult problem. The matrices representing quantum

gates, and the vectors that model quantum states grow exponentially

with an increase in the number of qubits – the quantum analogue of

the classical bit. Several software packages have been developed for

quantum-circuit simulation including Oemer’s Quantum Computation

Language (QCL) [13] and Viamontes’ Quantum Information Decision

Diagrams (QuIDD) implemented in the QuIDDPro package [18].

While QCL simulates circuits directly using state vectors, QuIDDPro

uses a variant of binary decision diagrams to store state vectors

more compactly in some cases. Since the state-vector representation

requires excessive computational resources in general, simulation-

based reliability studies (e.g. simulated fault-injection analysis) of

quantum FT architectures using general-purpose simulators has been

limited to small quantum circuits [3]. Therefore, designing fast

simulation techniques that target quantum FT circuits facilitates more

robust reliability analysis of larger quantum circuits.

This work was sponsored in part by the Air Force Research Laboratory under

agreement FA8750-11-2-0043.

Stabilizer circuits and states. Gottesman [7] and Knill identiﬁed

an important subclass of quantum circuits, called stabilizer circuits,

which can be simulated efﬁciently on classical computers. Stabilizer

circuits are exclusively composed of stabilizer gates – controlled-

NOT, Hadamard and Phase gates (Figure 1) followed by one-

qubit measurements in the computational basis. Such circuits are

applied to a computational basis state (usually |00...0i) and produce

output states known as stabilizer states. Because of their extensive

applications in QECC and FT architectures, stabilizer circuits have

been studied heavily [1], [7]. Stabilizer circuits can be simulated

in polynomial-time by keeping track of the Pauli operators that

stabilize1the quantum state. Such stabilizer operators are maintained

during simulation and uniquely represent stabilizer states up to

an unobservable global phase.2Therefore, this technique offers an

exponential improvement over the computational resources needed to

simulate stabilize circuits using vector-based representations.

Aaronson and Gottesman [1] proposed an improved technique

that uses a bit-vector representation to simulate stabilizer circuits.

Aaronson implemented this simulation technique in his CHP software

package. Compared to other vector-based simulators (QuIDDPro,

QCL) the technique in [1] does not maintain the global phase of a

state and simulates each stabilizer gate in Θ(n)time using Θ(n2)

space. The overall runtime of CHP is dominated by the number of

measurement gates, which require O(n2)time to simulate.

Stabilizer-based simulation of generic circuits. We propose a

generalization of the stabilizer formalism that admits simulation of

non-stabilizer gates such as Toffoli3gates. This line of research

was ﬁrst outlined in [1], where the authors describe a stabilizer-

based representation that stores an arbitrary quantum state as a sum

of density-matrix4terms. In contrast, we store arbitrary states as

superpositions5of stabilizer states. Such superpositions are stored

more compactly than the approach from [1], although we do not

handle density matrices. Another key difference is that our approach

explicitly maintains the global phase of each stabilizer state because

in a superposition such phases become relative. We store stabilizer-

state superpositions compactly using our proposed stabilizer frame

data structure. To speed up relevant algorithms, we store generator

sets for each stabilizer frame in row-echelon form to avoid expensive

Gaussian elimination during simulation. The main advantages of

using stabilizer-state superpositions to simulate quantum circuits are:

1An operator Uis said to stabilize a state iff U|ψi=|ψi.

2According to quantum physics, the global phase exp(iθ)of a quantum state is

unobservable and does not need to be simulated.

3The Toffoli gate is a 3-bit gate that maps (a,b,c) to (a,b,c⊕(ab)).

4Density matrices are self-adjoint positive-semideﬁnite matrices of trace 1.0, that

describe the statistical state of a quantum system [11].

5A superposition is a norm-1linear combination of terms.

H=1

√21 1

1−1P=1 0

0iCN OT =

1000

0100

0001

0010

Fig. 1. Stabilizer gates: Hadamard (H), Phase (P), controlled-NOT (CNOT).

(i)Stabilizer subcircuits are simulated with high efﬁciency.

(ii)Superpositions can be restructured and compressed on the ﬂy

during simulation to reduce resource requirements.

Our stabilizer-based technique simulates certain quantum arithmetic

circuits in polynomial time and space for input states consisting

of unbiased superpositions of computational-basis states. On such

instances, known generic simulation techniques take exponential

time. We simulate various quantum Fourier transform and quantum

FT circuits, and the results demonstrate that our data structure leads

to orders-of-magnitude improvement in runtime and memory as

compared to state-of-the-art simulators.

In the remaining part of this document, we assume a superﬁcial

familiarity with quantum computing, as outlined in [11] and EDA

publications such as [16]. Section II describes key concepts related

to quantum-circuit simulation and the stabilizer formalism. In Sec-

tion III, we introduce stabilizer frames and describe in detail our

simulation ﬂow implemented in Quipu. In Section IV, we discuss

our empirical validation of Quipu and comparisons with state-of-

the-art simulators. Section V closes with concluding remarks.

II. BACKGROU ND A ND PREVIOUS WO RK

Quantum information processes, including quantum algorithms,

are often modeled using quantum circuits and are represented by

diagrams, just like conventional digital circuits [11], [18]. Quantum

circuits are sequences of gate operations that act on some register of

qubits – the basic unit of information in a quantum system. A single

qubit is described by a quantum state |ψi, which is a two-dimensional

vector over the complex numbers. In contrast to classical bits, qubits

can be in a superposition of both the 0and 1states. Formally,

|ψi=α0|0i+α1|1i, where |0i= (1,0)>and |1i= (0,1)>are

the two-dimensional computational basis states and αiare probability

amplitudes that satisfy |α0|2+|α1|2= 1. An n-qubit register is the

tensor product of nsingle qubits and thus is modeled by a complex

vector |ψni=|ψ1i ⊗ ·· · ⊗ |ψni=P2n−1

i=0 αi|bii, where each

biis a binary string representing the value iof each basis state.

Furthermore, |ψnisatisﬁes P2n−1

i=0 |αi|2= 1. Each gate operation

or quantum gate is a unitary matrix that operates on a small subset

of the qubits in a register. For example, the quantum analogue of a

NOT gate is the operator X= ( 0 1

1 0 ),

α0|00i+α1|10iX⊗I

−−−→ α0|10i+α1|00i

Similarly, the two-qubit CNOT operator ﬂips the second qubit

(target) iff the ﬁrst qubit (control) is set to 1, e.g.,

α0|00i+α1|10iCN OT

−−−−−→ α0|00i+α1|11i

Another operator of particular importance is the Hadamard (H) gate.

This gate is frequently used to put a qubit in a superposition of

computational-basis states, e.g.,

α0|00i+α1|10iI⊗H

−−−→ (α0|00i+α0|01i+α1|10i+α1|11i)/√2

Note that the H gate generates unbiased superpositions in the

sense that the squares of the absolute value of the amplitudes are

equal. The dynamics involved in observing or measuring a quantum

state are described by non-unitary projection operators. There are

different types of quantum measurements, but the one most pertinent

to our discussion are measurements in the computational basis,

i.e., measurements with respect to the |0ior |1ibasis states. The

projection operators for such measurements are P0= ( 1 0

0 0 )and

P1= ( 0 0

0 1 ), respectively. The probability p(x)of obtaining outcome

x∈ {0,1}on the jth qubit of state |ψiis given by the inner product

hψ|Pj

x|ψi, where hψ|is the conjugate transpose of |ψi. For example,

suppose we want to measure |ψi=α0|0i+α1|1iin the |1ibasis:

p(1) = (α∗

0, α∗

1)P1(α0, α1)>= (0, α∗

1)(α0, α1)>=|α1|2

Cofactors of quantum states. The output states obtained after

performing computational-basis measurements are called cofactors,

and are orthogonal states of the form |0i|ψ0iand |1i |ψ1i. We denote

the |0i- and |1i-cofactor by |ψj=0iand |ψj=1 i, respectively, where

jis the index of the measured qubit. One can also consider iterated

cofactors, such as double cofactors |ψqr=00i,|ψq r=01i,|ψq r=10iand

|ψqr=11 i. Cofactoring with respect to all qubits produces amplitudes

of individual basis vectors.

A. Quantum circuits and simulation

To simulate a quantum circuit C, we ﬁrst initialize the quantum

system to some desired state |ψi(usually a basis state). |ψican be

represented using a ﬁxed-size data structure (e.g., an array of 2n

complex numbers) or a variable-size data structure (e.g., algebraic

decision diagram). We then track the evolution of |ψivia its internal

representation as the gates in Care applied until one obtains the

output state C |ψi[1], [11], [18]. Most quantum-circuit simulators [5],

[12], [13], [18] support some form of the linear-algebraic operations

described earlier. The drawback of such simulators is that their

runtime grows exponentially in the number of qubits. This holds true

not only in the worst case but also in many practical applications

involving arithmetic and FT circuits.

Gottesman developed a simulation method involving the Heisen-

berg model [7] often used by physicists to describe atomic phe-

nomena. In this model, one keeps track of the symmetries of an

object rather than represent the object explicitly. In the context of

quantum-circuit simulation, this model represents quantum states by

their symmetries, rather than complex vectors. The symmetries are

operators for which these states are 1-eigenvectors. Algebraically,

symmetries form group structures, which can be speciﬁed compactly

by group generators.

B. The stabilizer formalism

A unitary operator Ustabilizes a state |ψiiff |ψiis a 1–eigenvector

of U, i.e., U|ψi=|ψi. We are interested in operators Uderived

from the Pauli matrices: X= ( 0 1

1 0 ), Y =0−i

i0, Z =1 0

0−1,

and the identity I= ( 1 0

0 1 ). The one-qubit states stabilized by the

Pauli matrices are:

X:(|0i+|1i)/√2−X:(|0i − |1i)/√2

Y:(|0i+i|1i)/√2−Y:(|0i − i|1i)/√2

Z:|0i −Z:|1i

Observe that Istabilizes all states and −Idoes not stabilize any

state. Thus, the entangled state (|00i+|11i)/√2is stabilized by

the Pauli operators X⊗X,−Y⊗Y,Z⊗Zand I⊗I. As

shown in Table I, it turns out that the Pauli matrices along with

Iand the multiplicative factors ±1,±i, form a closed group

under matrix multiplication [11]. Formally, the Pauli group Gnon

nqubits consists of the n-fold tensor product of Pauli matrices,

P=ikP1⊗···⊗Pnsuch that Pj∈ {I, X , Y, Z}and k∈ {0,1,2,3}.

For brevity, the tensor-product symbol is often omitted so that P

is denoted by a string of I,X,Yand Zcharacters or Pauli

literals and a separate integer value kfor the phase ik. This string-

integer pair representation allows us to compute the product of Pauli

operators without explicitly computing the tensor products,6e.g.,

(−II X I)(iIY II ) = −iIY XI . Since | Gn|= 4n+1,Gncan have at

6This holds true due to the identity: (A⊗B)(C⊗D)=(AC ⊗BD).

TABLE I

MULTI PL IC ATION TA BL E FOR PAU LI M ATRI CES . SHADED

CELLS INDICATE ANTICOMMUTING PRODUCTS.

I X Y Z

I I X Y Z

X X I iZ −iY

Y Y −iZ I iX

Z Z iY −iX I

most log2| Gn|= log24n+1 = 2(n+1) irredundant generators [11].

The key idea behind the stabilizer formalism is to represent an n-qubit

quantum state |ψiby its stabilizer group S(|ψi)– the subgroup of

Gnthat stabilizes |ψi. One can show that, if |S(|ψi)|= 2n, the group

uniquely speciﬁes |ψi. In this case, |ψibelongs to an important class

of quantum states called stabilizer states. Furthermore, S(|ψi)itself

is speciﬁed by only log22n=nirredundant stabilizer generators.

Therefore, an arbitrary n-qubit stabilizer state can be represented

by a stabilizer matrix Mwhose rows represent a set of generators

Q1,...,Qnfor S(|ψi). (Hence we use the terms generator set and

stabilizer matrix interchangeably.) Since each Qiis a string of n

Pauli literals, the size of the matrix is n×n. The phases of each

Qiare stored separately using a vector of nintegers. For example,

one can show that |ψi= (|00i+|11i)/√2is uniquely speciﬁed by

any of the following matrices: M1=+

+XX

ZZ ,M2=+

−XX

Y Y ,

M3=−

+Y Y

ZZ .One obtains M2from M1by left-multiplying the

second row by the ﬁrst. Similarly, M3is obtained from M1or M2

via row multiplication. Observe that, multiplying any row by itself

yields II , which stabilizes |ψi. However, II cannot be used as a

generator because it is redundant and carries no information about

the structure of |ψi. The storage cost for Mis Θ(n2), which is an

exponential improvement over the O(2n)cost often encountered in

vector-based representations.

Stabilizer-circuit simulation. The computational basis states are

stabilizer states that can be represented by the stabilizer-matrix

structure depicted in Figure 2-a. In this matrix form, the ±sign of

each row along with its corresponding Zj-literal designates whether

the state of the jth qubit is |0i(+) or |1i(−). Suppose we want

to simulate circuit C. Stabilizer-based simulation ﬁrst initializes M

to specify some basis state. Then, to simulate the action of each

gate U∈ C, we conjugate each row Qiof Mby U.7We require

that UQiU†maps to another string of Pauli literals so that the

resulting matrix M0is well-formed. It turns out that the H, P and

CNOT gates have such mappings, i.e., these gates conjugate the Pauli

group onto itself [7], [11]. Table II lists the mapping for each of

these gates. For example, suppose we simulate a CNOT operation on

|ψi= (|00i+|11i)/√2. Using the stabilizer representation, we have

Mψ=+XX

+ZZ C NO T

−−−−−→ M0

ψ=+XI

+IZ .One can verify that the

rows of M0

ψstabilize |ψiCN OT

−−−−−→ (|00i+|10i)/√2as required. Since

H, P and CNOT gates are directly simulated using stabilizers, these

gates are commonly called stabilizer gates and any circuit composed

exclusively of such gates is called a unitary stabilizer circuit. Table II

shows that at most two columns of Mare updated when a stabilizer

gate is simulated. Thus, such gates are simulated in Θ(n)time.

The stabilizer formalism also admits measurements in the com-

putational basis [7]. Conveniently, the formalism avoids the direct

computation of projection operators and inner products (Section II).

Note that any qubit jin a stabilizer state is either in a |0i(|1i)

state or in an unbiased superposition of both. The former case is

called a deterministic outcome and the latter a random outcome.

7Since Qi|ψi=|ψi, the resulting state U|ψiis stabilized by UQiU†because

(UQiU†)U|ψi=U Qi|ψi=U|ψi.

TABLE II

CON JUG ATIO N OF PAUL I-G ROU P EL EM ENT S BY S TABI LI ZE R GATE S [11].

FOR C NOT ,SUBSCRIPT 1I ND IC ATES T HE CO NT ROL A ND 2TH E TAR GET.

GATE INP UT OUT PUT

X Z

H Y -Y

Z X

X Y

P Y -X

Z Z

GATE INP UT OUTPUT

CN OT

I1X2I1X2

X1I2X1X2

I1Y2Z1Y2

Y1I2Y1X2

I1Z2Z1Z2

Z1I2Z1I2

(a) (b)

Fig. 2. (a) Stabilizer-matrix structure for basis states. (b) Row-echelon form

for stabilizer matrices. The X-block contains a minimal set of generators with

X/Yliterals. Generators with Zand Iliterals only appear in the Z-block.

We can tell these cases apart in Θ(n)time by searching for Xor

Yliterals in the jth column of M. If such literals are found, the

qubit must be in a superposition and the outcome is random with

equal probability (p(0) = p(1) = .5); otherwise the outcome is

deterministic (p(0) = 1 or p(1) = 1).

Randomized-outcome case: one ﬂips an unbiased coin to decide the

outcome and then updates Mto make it consistent with the outcome

obtained. Since we might have to examine Min its entirety, the

runtime is O(n2).

Deterministic-outcome case: no updates to Mare necessary but

we need to ﬁgure out whether the qubit is in the |0ior |1istate, i.e.,

whether the qubit is stabilized by Zor -Z. One approach is to perform

Gaussian elimination (GE) to put Min row-echelon form. This

removes redundant literals from Mand makes it possible to identify

the row containing a Zin its jth position and I’s everywhere else.

The ±phase of such a row decides the outcome of the measurement.

Since this is a GE-based approach, it takes O(n3)time in practice.

The work in [1] improved the runtime of deterministic measure-

ments by doubling the size of Mto include ndestabilizer generators.

Such destabilizer generators help identify exactly which row multi-

plications to compute in order to decide the measurement outcome.

This approach avoids GE and thus deterministic measurements are

computed in O(n2)time.

III. SIMULATION OF QUAN TU M CIRCUITS

USING STABILIZER FRAMES

The stabilizer gates by themselves do not form a universal set for

quantum computation [1], [11]. However, the Hadamard and Toffoli

(T OF ) gates do [2]. Thus, it sufﬁces to show how to simulate the

Toffoli gate using the stabilizer formalism in order to make our gate

set universal. To accomplish this, we represent arbitrary quantum

states as superpositions of stabilizer states. For example, recall from

Section II-B that the computational basis states are stabilizer states.

Thus, any one-qubit state |ψi=α1|0i+α2|1iis a superposition of

the two stabilizer states |0iand |1i. Observe that, if |ψiis unbiased,

i.e., |α1|2=|α2|2, it can represented using a single stabilizer state

instead of two (up to a global phase). The key idea behind our

technique is to identify and compress large unbiased superpositions

on the ﬂy during simulation to reduce resource requirements.

Stabilizer frames. Suppose |ψiis an n-qubit stabilizer state and we

want to simulate the action of T OFc1c2t, where c1and c2are the

control qubits, and tis the target. First, we decompose |ψiinto all

four of its double cofactors (Section II) over the control qubits,

|ψi= (|ψc1c2=00i+|ψc1c2=01 i+|ψc1c2=10i+|ψc1c2=11 i)/2

which is an unbiased superposition of orthogonal states. Since |ψi

is a stabilizer state and the cofactors are obtained by performing

measurements on |ψi, each |ψc1c2iis computed in O(n2)time

(Section II-B). We compute the action of the Toffoli as,

T OFc1c2t|ψi= ( |ψc1c2=00 i+|ψc1c2=01i

+|ψc1c2=10i+Xt|ψc1c2=11 i)/2

Fig. 3. Simulation of the Toffoli gate using a superposition of stabilizer states.

Amplitudes are omitted for clarity. The Xgate is applied to the third qubit

of the |ψc1c2=11icofactor. The (±)-phase vectors are shown as prepended

columns to the corresponding stabilizer matrices.

where Xtis the Pauli gate (NOT) acting on target t. Each |ψc1c2i

is represented by the same M, but with a different permutation of

leading row phases as shown in Figure 3. Thus, one can represent the

orthogonal stabilizer-state superpositions that arise when simulating

Toffoli gates by a stabilizer frame Fconsisting of (i) a stabilizer

matrix Mand (ii) a set of kdistinct leading (±)-phase vectors.

Each phase vector in the frame represents a distinct state in the

superposition. Additionally, one maintains a vector a= (a1,...,ak)

of the amplitudes associated with the states (phase vectors) in the

superposition, e.g., a= (.5, .5, .5, .5) in Figure 3. Controlled-phase

gates R(α)ct can also be simulated using stabilizer frames. This gate

applies a phase-shift factor of eiα if both the control qubit cand target

qubit tare set. Thus, we compute the action of R(α)ct as,

R(α)ct |ψi= ( |ψct=00 i+|ψct=01i+|ψct=10 i+eiα |ψct=11i)/2

Observe that, in contrast to T OF gates, controlled-R(α)gates

produce biased superpositions. The Hadamard and controlled-R(α)

gates are used to implement the quantum Fourier transform circuit,

which plays a key role in Shor’s factoring algorithm.

A. Frame-based Simulation

We now discuss how to manipulate a stabilizer frame Fin order

to simulate generic quantum circuits with both stabilizer and non-

stabilizer gates. To simulate stabilizer gates, we ﬁrst update the

stabilizer matrix Massociated with Fas per Section II-B. Then,

we iterate over the phase vectors in Fand update each accordingly

(Table II). Thus, this operation takes O(nk)time for a superposition

with kstates. To simulate a non-stabilizer gate, we ﬁrst update

M(i.e., apply measurements to obtain relevant cofactors). We then

iterate over each phase vector in Fand permute the corresponding

phases in order to generate additional phase vectors corresponding to

the cofactor states. As in the case of stabilizer gates, this operation

is linear in the number of phase vectors. However, by the end of

the operation, the number of phase vectors (states) in Fwill have

grown by a (worst case) factor of four in the case of both T OF

and controlled-R(α). For an arbitrary n-qubit stabilizer frame F,

the number of phase vectors is upper bounded by 2n, the number of

possible ±permutations.

Prior work on simulation of non-stabilizer gates using the stabilizer

formalism can be found in [1] where the authors propose an approach

that represents a quantum state as a sum of O(42d)density-matrix

terms, where dis the number of distinct qubits involved in non-

stabilizer operations.

Global phases of states in F. In quantum mechanics, the states

eiθ |ψiand |ψiare considered phase-equivalent because eiθ does

not affect the statistics of measurement. During stabilizer-based

simulation, such global phases are not maintained. Since these phases

are unobservable, this is not a problem when simulating a single

stabilizer state. However, since we manipulate superpositions of

states, such global phases become relative and cannot be ignored. In

frame-based simulation, we maintain the global phases of the states in

Fusing the amplitude vector a. Let pibe the phase-vector associated

with ai∈a. When simulating gate U, we update each aias follows:

1) Set the leading phases of the rows in Mto pi.

2) Obtain a basis state |bifrom Mand store its amplitude β. If

Uis the Hadamard gate, it may be necessary to sample a sum

of two non-zero basis amplitudes (one real, one imaginary).

3) Compute U(β|bi) = β0|b0ivia the state-vector representation.

4) Obtain |b0ifrom UMU†and store its non-zero amplitude γ.

5) Compute the global-phase factor generated as ai= (ai·β0)/γ.

To sample the computational-basis amplitudes |biand |b0ifrom the

stabilizer, Mneeds to be in row-echelon form (Figure 2-b). Thus,

each global-phase computation takes O(n3)time for an n-qubit M.

To improve this, we introduce a simulation invariant.

Invariant 1: The stabilizer matrix Massociated with Fremains

in row-echelon form (Figure 2b) during simulation.

Since stabilizer gates affect at most two columns of M, Invariant 1

can be repaired with O(n)row multiplications. Since each row

multiplication takes Θ(n), the runtime required to update Mduring

global-phase maintenance simulation is O(n2). Therefore, for an

arbitrary n-qubit stabilizer frame with kstates, the overall runtime

for simulating a single gate is O(n2+nk)since one can memoize

the updates to Mrequired to compute each ai.

Measuring F. Since the states in Fare orthogonal, the outcome

probability when measuring Fis calculated as the sum of the

normalized outcome probabilities of each state. The normalization is

with respect to the amplitudes stored in aand thus the overall mea-

surement outcome may have a non-uniform distribution. Formally, let

Ψ = Piai|ψiibe the superposition of states represented by F, the

probability of observing outcome x∈ {0,1}upon measuring qubit

mis,

p(x)Ψ=

k

X

i=1 |ai|2hψi|Pm

x|ψii=

k

X

i=1 |ai|2p(x)ψi

where Pm

xdenotes the projection operator in the computational

basis xas discussed in Section II. The outcome probability for

each stabilizer state p(x)ψiis computed as outlined in Section II-B.

Once we compute p(x)Ψ, we ﬂip a (possibly biased) coin to decide

the outcome and update the stabilizer matrix associated with F

(Section II-B). In the worst case, the outcomes of all the states

in Ψare random and each require an O(n2)-time update to M.

(Deterministic measurements do not require updates to Mand, since

we maintain Invariant 1, such measurements can be decided in linear

time.) Thus, measuring a frame with kstates takes O(n2+nk)time.

Multiframe simulation. Although a single frame is sufﬁcient to rep-

resent a stabilizer-state superposition Ψ, one can tame the exponential

growth of states in Ψby admitting a multiframe representation. Such

a representation cuts down the total number of states required to

represent Ψby at least a half, thus improving the scalability of our

technique. Our experiments in Section IV show that, when simulating

ripple-carry adders, the number of states in Ψgrows linearly when

multiframes are used but exponentially when a single frame is used.

One derives a multiframe representation directly from a single

frame Fby examining the set of phase vectors and identifying

candidate pairs that can be coalesced into a single phase vector

associated with a different stabilizer matrix. Since we maintain the

stabilizer matrix Mof a frame in row-echelon form (Invariant 1),

examining the phases corresponding to Zjrows (Z-literal in jth

column and I’s in all other columns) allows us to identify the columns

in Mthat need to be modiﬁed in order to coalesce candidate pairs.

Figure 4 shows an example of this process. To obtain M1in the

Figure 4 example, we conjugate the ﬁrst column of Mby an H

gate. Similarly, to obtain M2we conjugate the ﬁrst column by H

and then conjugate the ﬁrst and third columns by CNOT. Thus, the

output of this coalescing process is a list of frames F1,F2,...,Fl

Fig. 4. Example of how a multiframe representation is derived from a single-

frame representation. Each frame Ficonsists of a stabilizer matrix Mi, a set

of (±)-phase vectors and a vector of amplitudes ai.

that together represent the same superposition as the original input

frame. We introduce the following invariant to facilitate simulation

of quantum measurements on multiple frames.

Invariant 2: The stabilizer frames that represent a superposition of

stabilizer states remain mutually orthogonal during simulation, i.e.,

every pair of (basis) vectors from any two frames are orthogonal.

To maintain Invariant 2 we deﬁne a speciﬁc type of candidate

pair such that the new frames generated from the set of coalesced

phase vectors are mutually orthogonal. Suppose hpr,pjiare a pair

of phase vectors from the same n-qubit frame. Then hpr,pjiis

considered a candidate iff it has the following properties: (i)pr

and pjare equal up to m≤nentries corresponding to Zk-rows

(where kis the qubit the row stabilizes), and (ii)ar=idajfor some

d∈ {0,1,2,3}(where arand ajare the frame amplitudes paired

with prand pj). The stabilizer circuit needed to coalesce a candidate

pair is deﬁned as C=CNOTv1,v2CNOTv1,v3···CNOTv1,vmPd

v1Hv1,

where the vkdesignate the qubits stabilized by the mdiffering entries

in the candidate pair. The steps in our coalescing procedure are:

1) Sort phase vectors according to differing entries such that

candidate pairs are next to each other.

2) Coalesce candidate pairs into a new set of phase vectors.

3) Create a new frame Ficonsisting of the set of coalesced phase

vectors and the new stabilizer matrix CMC†.

4) Repeat steps 2–3 until no candidate pairs remain.

The runtime of this procedure is dominated by Step 1. Each phase-

vector comparison takes Θ(n)time, where nis the size of the phase

vectors. Therefore, the runtime of step 1 and our overall coalescing

procedure is O(nk log k)for a single frame with kphase vectors.

To simulate stabilizer, T OF , controlled-R(α)and measurement

gates using multiple frames, one applies our single-frame algorithms

to each frame in the list independently. In the case of TO F and

controlled-R(α)gates, additional steps are required:

1) Apply the coalescing procedure to each frame and insert the

new “coalesced” frames in the list.

2) Merge frames with equivalent stabilizer matrices.

3) Repeat Steps 1 and 2 until no new frames are generated.

The simulation ﬂow of our technique is shown in Figure 5 and

implemented in our software package Quipu.

IV. EMPIRICAL VALIDATION

We tested a single-threaded version of Quipu on a conventional

Linux server using several benchmark sets consisting of stabilizer

circuits, quantum ripple-carry adders, quantum Fourier transform

circuits and quantum fault-tolerant (FT) circuits.

Stabilizer circuits. We compared the runtime performance of Quipu

against that of CHP using a benchmark set similar to the one used

in [1]. We generated random stabilizer circuits on nqubits, for

n∈ {100,200,...,1500}. The use of randomly generated bench-

marks is justiﬁed for our experiments because (i) our algorithms are

not explicitly sensitive to circuit topology and (ii) random stabilizer

Fig. 5. Simulation ﬂow for Quipu.

circuits have been considered representative [9]. For each n, we

generated the circuits as follows: ﬁx a parameter β > 0; then

choose βdnlog2nerandom unitary gates (CNOT, P or H) each with

probability 1/3. Then measure each qubit a∈ {0,...,n −1}in

sequence. We measured the number of seconds needed to simulate

the entire circuit. The entire procedure was repeated for βranging

from 0.6to 1.2in increments of 0.1. Figure 6 shows the average

time needed by Quipu and CHP to simulate this benchmark set. The

purpose of this comparison is to evaluate the overhead of supporting

generic circuit simulation in Quipu. Since CHP is specialized to

stabilizer circuits, we do not expect Quipu to be faster. When

β= 0.6, the simulation time appears to grow roughly linearly

in nfor both simulators. However, when the number of unitary

gates is doubled (β= 1.2), the runtime of both simulators grows

roughly quadratically. Thus, the performance of both CHP and Quipu

depends strongly on the circuit being simulated. Although Quipu is

5×slower than CHP, we note that Quipu maintains global phases

whereas CHP does not. Figure 6 shows that Quipu is asymptotically

as fast as CHP when simulating stabilizer circuits that contain a linear

number of measurements.

Ripple-carry adders. Our second benchmark set consists of n-

bit ripple-carry (Cuccaro) adder [4] circuits, which often appear as

components in many arithmetic circuits [10]. The Cuccaro circuit

for n= 3 is shown in Figure 7. Such circuits act on two n-qubit

input registers, one ancilla qubit and one carry qubit for a total

of 2(n+ 1) qubits. We applied H gates to all 2ninput qubits in

order to simulate addition on a superposition of 22ncomputational-

basis states. Figure 8 shows the average runtime needed to simulate

this benchmark set using Quipu. For comparison, we ran the

same benchmarks on an optimized version of QuIDDPro, called

QPLite8, speciﬁc to circuit simulation [18]. When n < 15,QPLite

8QPLite is up to 4×faster since it removes overhead related to QuIDDPro’s

interpreted front-end for extended quantum programming [15].

Runtime (secs)

0

50

100

150

200

200 400 600 800 1000 1200 1400 1600

CHP

β = .6

β = .7

β = .8

β = .9

β= 1.0

β= 1.1

β = 1.2

0

200

400

600

800

1000

200 400 600 800 1000 1200 1400 1600

Quipu

β = .6

β = .7

β = .8

β = .9

β= 1.0

β= 1.1

β = 1.2

Number of qubits

Fig. 6. Average time needed by Quipu and CHP to simulate an n-

qubit stabilizer circuit with βn log ngates and nmeasurements. Quipu is

asymptotically as fast as CHP but is not limited to stabilizer circuits.

|b0iH• • |s0i

|a0iH• • • |a0i

|0i • • • |0i

|b1iH • • |s1i

|a1iH• • • • • • |a1i

|b2iH • |s2i

|a2iH• • • • • |a2i

|zi |z⊕s3i

Fig. 7. Ripple-carry (Cuccaro) adder for 3-bit numbers a=a0a1a2and

b=b0b1b2. The third qubit from the top is an ancilla and the zqubit is the

carry. The b-register is overwritten with the result s0s1s2.

is faster than Quipu because the QuIDD representing the state

vector remains compact during simulation. However, for n > 15,

the compactness of the QuIDD is considerably reduced, and the

majority of QPLite’s runtime is spent in non-local pointer-chasing

and memory (de)allocation. Thus, QPLite fails to scale on such

benchmarks and one observes an exponential increase in runtime.

Furthermore, Quipu consumed 62% less memory than QPLite in

each of these benchmarks.

We ran the same benchmarks using both the single-frame and

multiframe approaches. In the case of a single frame, the number of

states in a superposition grows exponentially in n. However, in the

multiframe approach, the number of states grows linearly in n. This

is because T OF gates produce large equal superpositions that are

effectively compressed by our coalescing technique. Since our frame-

based algorithms require poly(k) time for kstates in a superposition,

Quipu simulates Cuccaro circuits in polynomial time and space for

input states consisting of large superpositions of basis states. On

such instances, known linear-algebraic simulation techniques (e.g.,

QuIDDPro) take exponential time.

The work in [10] describes additional quantum arithmetic circuits

that are based on Cuccaro adders (e.g., subtractors, conditional

adders, comparators). We used Quipu to simulate such circuits and

observed similar runtime performance as that shown in Figure 8.

Quantum Fourier transform (QFT) circuits. Our third benchmark

set consists of circuits for implementing the n-qubit QFT, which

computes the discrete Fourier transform of the amplitudes in the input

quantum state. Let |x1x2. . . xni,xi∈ {0,1}be a computational

basis state and x1,2,...,m =Pm

k=1 xk2−k. The action of the QFT on

input state can be expressed as:

|x1...xni=1

√2n|0i+e2iπ·xn|1i⊗|0i+e2iπ·xn−1,n |1i⊗

· ·· ⊗ |0i+e2iπ·x1,2,...,n |1i(1)

Avg. runtime (secs)

0

10

20

30

40

50

60

5 10 15 20 25

Quipu

QPLite

0

5

5 10 15 20

zoom

n-bit Cuccaro adder (2n+ 2 qubits)

Fig. 8. Average runtime needed by Quipu and QuIDDPro to simulate n-bit

Cuccaro adders after an equal superposition of allcomputational basis states

is obtained using a block of Hadamard gates (Figure 7). Quipu consumed

62% less memory than QPLite for each of these benchmarks.

|x2i• • H|y0i

|x1i•HR(π/2) |y1i

|x0iHR(π/2) R(π/4) |y2i

Fig. 9. The three-qubit QFT circuit. In general, The ﬁrst qubit requires one

Hadamard gate, the next qubit requires a Hadamard and a controlled-R(α)

gate, and each following qubit requires an additional controlled-R(α)gate.

Summing up the number of gates gives O(n2)for an n-qubit QFT circuit.

The QFT is used in many quantum algorithms, notably Shor’s

factoring and discrete logarithm algorithms. Such circuits are com-

posed of a network of Hadamard and controlled-R(α)gates, where

α=π/2kand kis the distance over which the gate acts. The three-

qubit QFT circuit is shown in Figure 9. Figure 10 shows average

runtime and memory usage for both Quipu and QPLite on QFT

instances for n={10,12,...,20}.Quipu runs approximately

4×faster than QPLite on average and consumes about 90% less

memory. For these benchmarks, we observed that the number of

states in our multiframe data structure was 2n−1. This is because

controlled-R(α)gates produce biased superpositions (Section III-A)

that cannot be effectively compressed using our coalescing procedure.

Therefore, as Figure 10 shows, the runtime and memory requirements

of both Quipu and QPLite grow exponentially in nfor QFT

instances. However, Quipu scales to 22-qubit instances whereas

QPLite scales to only 18 qubits.

Fault-tolerant (FT) circuits. Our last benchmark set consists of

circuits that, in addition to preparing encoded quantum states, im-

plement procedures for performing FT quantum operations [6], [11],

[14]. FT operations limit the propagation errors from one qubit

in a QECC-register (the block of qubits that encodes a logical

qubit) to another qubit in the same register, and a single faulty

gate damages at most one qubit in each register. One constructs

FT stabilizer circuits by executing each stabilizer gate transversally9

across QECC-registers [7], [11], [14]. Non-stabilizer gates need to

be implemented using a FT architecture that often requires additional

ancilla qubits, measurements and correction procedures conditioned

on measurement outcomes. Figure 11 shows a circuit that implements

a FT-Toffoli operation [14]. Each line in Figure 11 represents a 5-

qubit register implementing the DiVincenzo/Shor code.

We implemented FT benchmarks for the half-adder and full-adder

circuits as well as for computing f(x) = bxmod 15. Each circuit from

Figure 12 implements f(x)with a particular co-prime base value b

as a (2,4) look-up table (LUT).10 The Toffoli gates in all our FT

benchmarks are implemented using the FT architecture from Figure

11. Since FT-Toffoli operations require 6ancilla registers, a circuit

9In a transversal operation, the ith qubit in each QECC-register interacts only with

the ith qubit of other QECC-registers.

10A(k, m)-LUT takes kread-only input bits and m > log2kancilla bits. For

each 2kinput combination, an LUT produces a pre-determined m-bit value, e.g., a

(2,4)-LUT is deﬁned by values (1,2,4,8) or (1,4,1,4).

Avg. runtime (secs)

0

100

200

300

400

500

12 14 16 18 20 22

Quipu

QPLite

Peak memory (MB)

0

100

200

300

400

500

600

700

12 14 16 18 20 22

Quipu

QPLite

n-qubit QFT circuit n-qubit QFT circuit

Fig. 10. Average runtime and memory needed by Quipu and QuIDDPro

to simulate n-qubit QFT circuits, which contain n(n+ 1)/2gates.

TABLE III

AVERA GE TI ME A ND M EMO RY NE ED ED B Y QUIPU AND QPL ITE TO SI MU LATE O UR B ENC HM AR K SE T OF QU ANT UM F T CIRCUITS.

THE SECOND COLUMN INDICATES THE QECC USED TO E NC OD E kLOGICAL QUBITS INTO nPHYSICAL QUBITS. WE USE D TH E

3-QUB IT B IT-FLIP CODE FOR LARGER BENCHMARKS AND THE 5-QUBI T DIVINCENZO/S HOR CODE [6] FO R SM ALL ER O NE S (∗).

FAULT-TOL ER AN T QE CC T OTAL Q UBITS N UM.O F GATE S RU NT IM E (SE CS )MEMORY (MB) MAX SIZE(Ψ)

CIRCUIT [n, k](INC.ANCILLA)STA B.TO FF.QPLite Quipu QPLite Quipu SI NG LE FMULTI F

toffoli∗[15,3] 45 155 15 43.68 0.20 98.45 12.76 2816 32

halfadd∗[15,3] 45 160 15 43.80 0.20 94.82 12.76 2816 32

fulladd∗[20,4] 80 320 30 84.96 0.88 91.86 12.94 2816 32

2xmod15 [18,6] 81 396 36 4.81hrs 1.48 11.85 12.96 22528 64

4xmod15∗[30,6] 30 30 0 0.01 <0.01 6.14 12.01 1 1

7xmod15 [18,6] 81 402 36 11.25hrs 1.52 12.41 13.29 22528 64

8xmod15 [18,6] 81 399 36 11.37hrs 1.52 12.48 13.29 22528 64

11xmod15∗[30,6] 30 25 0 0.02 <0.01 6.14 12.01 1 1

13xmod15 [18,6] 81 399 36 11.28hrs 1.56 11.85 12.25 22528 64

14xmod15∗[30,6] 30 40 0 0.02 <0.01 6.14 12.01 1 1

that implements tFT-Toffolis using a k-qubit QECC, requires 6tk

ancilla qubits. Therefore, to compare with QPLite, we used the 3-

qubit bit-ﬂip code [11, Ch. 10] instead of the more robust 5-qubit code

in our larger benchmarks. Our results in Table III show that Quipu

is typically faster than QPLite by several orders of magnitude and

consumes 8×less memory for the toffoli,half-adder and full-adder

benchmarks. Table III also shows that our coalescing technique is

effective as the maximum size of the stabilizer-state superposition is

orders-of-magnitude smaller when multiple frames are used.

V. CONCLUSIONS AND FUTURE WORK

In this work, we developed new techniques for quantum-circuit

simulation based on superpositions of stabilizer states, and managed

to circumvent shortcomings in prior work [1]. We implement our

algorithms in our software package Quipu. Current simulators based

on the stabilizer formalism, such as CHP, are limited to simulation

of stabilizer circuits. Our results show that Quipu performs asymp-

totically as fast as CHP on stabilizer circuits with a linear number of

measurement gates. Our stabilizer-based technique simulates certain

quantum arithmetic circuits in polynomial time and space for input

states consisting of unbiased superpositions of computational-basis

states. QuIDDPro takes exponential time on such instances. We

simulated various quantum Fourier transform and quantum fault-

tolerant circuits with Quipu, and the results demonstrate that our

stabilizer-based technique leads to orders-of-magnitude improvement

in runtime and memory as compared to QuIDDPro. While our

technique uses more sophisticated mathematics and quantum-state

modeling, it is signiﬁcantly easier to implement and optimize.

|0i

|0i

|0i

|cati

|cati

|cati

|xi

|yi

|zi

H

H

H

H

H

H

r

r

e

e

r

r

r

e

e

r

r

r

e

e

r

e

r

e

r

r

e

r

e

r

e

H

Meas.

Meas.

Meas.

Meas.

Meas.

Meas.

r

Z

Z

6

r

e

e

6

e

r

e

6

|xi

|yi

|z⊕xyi

Fig. 11. Fault-tolerant implementation of a Toffoli gate. Each line represents

a5-qubit register and each gate is applied transversally. The state |cati=

(

0⊗5+

1⊗5)/√2is obtained using a stabilizer subcircuit (not shown). The

arrows point to the set of gates that is applied if the measurement outcome

is 1; no action is taken otherwise. Controlled-Zgates are implemented as

HjCN OTi,j Hjwith control iand target j.Zgates are implemented as P2.

|x0iH • •H •H • •H • •|x0i

|x1iH • • H H • •H • • |x1i

|0i |y0i

|0i |y1i

|0i |y2i

|0i |y3i

b= 2 b= 4 b= 7 b= 8

Fig. 12. Mod-exp with M= 15 implemented as (2,4)-LUTs [10] for several

co-prime base values. Negative controls are shown with hollow circles. We

apply Hadamards to each x-qubit to generate a superposition of all the input

values for x. Our benchmarks implement these computations using the 3-qubit

bit-ﬂip code [11, Ch. 10] and the FT-Toffoli architecture from Figure 11.

REFERENCES

[1] S. Aaronson, D. Gottesman, “Improved Simulation of Stabilizer Cir-

cuits,” Phys. Rev. A, vol. 70, no. 052328 (2004).

[2] D. Aharonov. “A Simple Proof that Toffoli and Hadamard are Quantum

Universal,” arXiv:quant-ph/0301040 (2003).

[3] O. Boncalo et al., “Using Simulated Fault Injection for Fault Tolerance

Assessment of Quantum Circuits,” Proc. Sim. Symp., pp.213-220 (2007).

[4] S. A. Cuccaro et al., “A New Quantum Ripple-carry Addition Circuit,”

arXiv:quant-ph/0410184v1 (2004).

[5] K. De Raedt et al., “Massively Parallel Quantum Computer Simulator”,

Comp. Phys. Comm., vol. 176, no. 2, pp. 121–136 (2007).

[6] D. P. DiVincenzo, P. W. Shor, “Fault-Tolerant Error Correction with

Efﬁcient Quantum Codes”, Phys. Rev. Lett., vol. 77, no. 3260 (1996).

[7] D. Gottesman, “The Heisenberg Representation of Quantum Computers,”

arXiv:9807006v1 (1998).

[8] L. Grover, “A Fast Quantum Mechanical Algorithm for Database

Search,” Symp. on Theory of Comp., pp. 212-219 (1996).

[9] E. Knill et al., “Randomized Benchmarking of Quantum Gates,” Phys.

Rev. A, vol. 77, no. 1 (2007).

[10] I. L. Markov, M. Saeedi, “Constant-optimized Quantum circuits for

Modular Multiplication and Exponentiation,” Quant. Info. and Comp.,

vol. 12, no. 5 (2012).

[11] M. A. Nielsen, I. L. Chuang, Quantum Computation and Quantum

Information, Cambridge University Press (2000).

[12] K. M. Obenland, A. M. Despain, “A Parallel Quantum Computer

Simulator, ” arXiv:quant-ph/9804039 (1998).

[13] B. Oemer (2003), http://tph.tuwien.ac.at/∼oemer/qcl.html.

[14] J. Preskill, “Fault Tolerant Quantum Computation,” Introduction to

Quantum Computation, World Scientiﬁc (1998). quant-ph/9712048.

[15] http://vlsicad.eecs.umich.edu/Quantum/qp/

[16] V. V. Shende, S. S. Bullock, I. L. Markov, “Synthesis of quantum logic

circuits,” IEEE Trans. on CAD, vol. 25, no. 6 (2006).

[17] P. Shor, “Polynomial-time Algorithms for Prime Factorization and Dis-

crete Logarithms on a Quantum Computer,” SIAM J. Comput, vol. 26,

no. 5 (1997).

[18] G. F. Viamontes, I. L. Markov, J. P. Hayes, Quantum Circuit Simulation,

Springer (2009).