PreprintPDF Available

# Asymptotic Improvements to Quantum Circuits via Qutrits

Authors:
• Super.tech
Preprints and early-stage research may not have been peer reviewed yet.

## Abstract and Figures

Quantum computation is traditionally expressed in terms of quantum bits, or qubits. In this work, we instead consider three-level qu$trits$. Past work with qutrits has demonstrated only constant factor improvements, owing to the $\log_2(3)$ binary-to-ternary compression factor. We present a novel technique using qutrits to achieve a logarithmic depth (runtime) decomposition of the Generalized Toffoli gate using no ancilla--a significant improvement over linear depth for the best qubit-only equivalent. Our circuit construction also features a 70x improvement in two-qudit gate count over the qubit-only equivalent decomposition. This results in circuit cost reductions for important algorithms like quantum neurons and Grover search. We develop an open-source circuit simulator for qutrits, along with realistic near-term noise models which account for the cost of operating qutrits. Simulation results for these noise models indicate over 90% mean reliability (fidelity) for our circuit construction, versus under 30% for the qubit-only baseline. These results suggest that qutrits offer a promising path towards scaling quantum computation.
Content may be subject to copyright.
Asymptotic Improvements to antum Circuits via trits
Pranav Gokhale
pranavgokhale@uchicago.edu
University of Chicago
Jonathan M. Baker
jmbaker@uchicago.edu
University of Chicago
Casey Duckering
cduck@uchicago.edu
University of Chicago
Natalie C. Brown
natalie.c.brown@duke.edu
Georgia Institute of Technology
Kenneth R. Brown
kenneth.r.brown@duke.edu
Duke University
Frederic T. Chong
chong@cs.uchicago.edu
University of Chicago
ABSTRACT
Quantum computation is traditionally expressed in terms of quan-
tum bits, or qubits. In this work, we instead consider three-level
qutrits. Past work with qutrits has demonstrated only constant fac-
tor improvements, owing to the
log2(
3
)
binary-to-ternary compres-
sion factor. We present a novel technique using qutrits to achieve
a logarithmic depth (runtime) decomposition of the Generalized
Tooli gate using no ancilla–a signicant improvement over linear
depth for the best qubit-only equivalent. Our circuit construction
also features a 70x improvement in two-qudit gate count over the
qubit-only equivalent decomposition. This results in circuit cost
reductions for important algorithms like quantum neurons and
Grover search. We develop an open-source circuit simulator for
qutrits, along with realistic near-term noise models which account
for the cost of operating qutrits. Simulation results for these noise
models indicate over 90% mean reliability (delity) for our circuit
construction, versus under 30% for the qubit-only baseline. These
results suggest that qutrits oer a promising path towards scaling
quantum computation.
CCS CONCEPTS
Computer systems organization Quantum computing.
KEYWORDS
quantum computing, quantum information, qutrits
ACM Reference Format:
Pranav Gokhale, Jonathan M. Baker, Casey Duckering, Natalie C. Brown,
Kenneth R. Brown, and Frederic T. Chong. 2019. Asymptotic Improvements
to Quantum Circuits via Qutrits. In ISCA ’19: 46th International Symposium
on Computer Architecture, June 22–26, 2019, PHOENIX, AZ, USA. ACM, New
York, NY, USA, 13 pages. https://doi.org/10.1145/3307650.3322253
1 INTRODUCTION
Recent advances in both hardware and software for quantum com-
putation have demonstrated signicant progress towards practical
outcomes. In the coming years, we expect quantum computing
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
ISCA ’19, June 22–26, 2019, PHOENIX, AZ, USA
ACM ISBN 978-1-4503-6669-4/19/06.. .$15.00 https://doi.org/10.1145/3307650.3322253 will have important applications in elds ranging from machine learning and optimization [ 1 ] to drug discovery [ 2 ]. While early re- search eorts focused on longer-term systems employing full error correction to execute large instances of algorithms like Shor fac- toring [ 3 ] and Grover search [ 4 ], recent work has focused on NISQ (Noisy Intermediate Scale Quantum) computation [ 5 ]. The NISQ regime considers near-term machines with just tens to hundreds of quantum bits (qubits) and moderate errors. Given the severe constraints on quantum resources, it is critical to fully optimize the compilation of a quantum algorithm in order to have successful computation. Prior architectural research has explored techniques such as mapping, scheduling, and parallelism [ 6 8 ] to extend the amount of useful computation possible. In this work, we consider another technique: quantum trits (qutrits). While quantum computation is typically expressed as a two-level binary abstraction of qubits, the underlying physics of quantum systems are not intrinsically binary. Whereas classical computers operate in binary states at the physical level (e.g. clipping above and below a threshold voltage), quantum computers have natural access to an innite spectrum of discrete energy levels. In fact, hardware must actively suppress higher level states in order to achieve the two-level qubit approximation. Hence, using three-level qutrits is simply a choice of including an additional discrete energy level, albeit at the cost of more opportunities for error. Prior work on qutrits (or more generally, d-level qudits) iden- tied only constant factor gains from extending beyond qubits. In general, this prior work [ 9 ] has emphasized the information compression advantages of qutrits. For example, N qubits can be expressed as N log2(3) qutrits, which leads to log2( 3 ) ≈ 1 . 6-constant factor improvements in runtimes. Our approach utilizes qutrits in a novel fashion, essentially using the third state as temporary storage, but at the cost of higher per- operation error rates. Under this treatment, the runtime (i.e. circuit depth or critical path) is asymptotically faster, and the reliability of computations is also improved. Moreover, our approach only applies qutrit operations in an intermediary stage: the input and output are still qubits, which is important for initialization and measurement on real devices [10, 11]. The net result of our work is to extend the frontier of what quan- tum computers can compute. In particular, the frontier is dened by the zone in which every machine qubit is a data qubit, for exam- ple a 100-qubit algorithm running on a 100-qubit machine. This is indicated by the yellow region in Figure 1. In this frontier zone, we do not have room for non-data workspace qubits known as ancilla. The lack of ancilla in the frontier zone is a costly constraint that 1 arXiv:1905.10481v1 [quant-ph] 24 May 2019 Infeasible, not enough qubits Feasible, can use ancilla Frontier, no space for ancilla Typical Number of Qubits on Machine Number of Data Qubits Figure 1: The frontier of what quantum hardware can ex- ecute is the yellow region adjacent to the 45°line. In this region, each machine qubit is a data qubit. Typical circuits rely on non-data ancilla qubits for workspace and therefore operate below the frontier. generally leads to inecient circuits. For this reason, typical cir- cuits instead operate below the frontier zone, with many machine qubits used as ancilla. Our work demonstrates that ancilla can be substituted with qutrits, enabling us to operate eciently within the ancilla-free frontier zone. We highlight the three primary contributions of our work: (1) A circuit construction based on qutrits that leads to asymp- totically faster circuits (633 N 38 log2N ) than equivalent qubit-only constructions. We also reduce total gate counts from 397Nto 6N. (2) An open-source simulator, based on Google’s Cirq [ 12 ], which supports realistic noise simulation for qutrit (and qudit) cir- cuits. (3) Simulation results, under realistic noise models, which demon- strate our circuit construction outperforms equivalent qubit circuits in terms of error. For our benchmarked circuits, our reliability advantage ranges from 2x for trapped ion noise models up to more than 10,000x for superconducting noise models. For completeness, we also benchmark our circuit against a qubit-only construction augmented by an ancilla and nd our construction is still more reliable. The rest of this paper is organized as follows: Section 2 presents relevant background about quantum computation and Section 3 outlines related prior work that we benchmark our work against. Section 4 demonstrates our key circuit construction, and Section 5 surveys applications of this construction toward important quan- tum algorithms. Section 6 introduces our open-source qudit circuit simulator. Section 7 explains our noise modeling methodology (with full details in Appendix A), and Section 8 presents simulation re- sults for these noise models. Finally, we discuss our results at a higher level in Section 9. 2 BACKGROUND A qubit is the fundamental unit of quantum computation. Compared to their classical counterparts which take values of either 0 and 1, qubits may exist in a superposition of the two states. We designate these two basis states as |0 and |1 and can represent any qubit as |ψ=α|0+β|1 with α2+β2= 1. α2 and β2 correspond to the probabilities of measuring |0and |1respectively. Quantum states can be acted on by quantum gates which (a) preserve valid probability distributions that sum to 1 and (b) guar- antee reversibility. For example, the X gate transforms a state |ψ=α|0+β|1 to X|ψ=β|0+α|1 . The X gate is also an example of a classical reversible operation, equivalent to the NOT operation. In quantum computation, we have a single irre- versible operation called measurement that transforms a quantum state into one of the two basis states with a given probability based on αand β. In order to interact dierent qubits, two-qubit operations are used. The CNOT gate appears both in classical reversible compu- tation and in quantum computation. It has a control qubit and a target qubit. When the control qubit is in the |1 state, the CNOT performs a NOT operation on the target. The CNOT gate serves a special role in quantum computation, allowing quantum states to become entangled so that a pair of qubits cannot be described as two individual qubit states. Any operation may be conditioned on one or more controls. Many classical operations, such as AND and OR gates, are irre- versible and therefore cannot directly be executed as quantum gates. For example, consider the output of 1 from an OR gate with two inputs. With only this information about the output, the value of the inputs cannot be uniquely determined. These operations can be made reversible by the addition of extra, temporary workspace bits initialized to 0. Using a single additional ancilla, the AND operation can be computed reversibly as in Figure 2. |q0|q0 |q1|q1 |0|q0AND q1 Figure 2: Reversible AND circuit using a single ancilla bit. The inputs are on the left, and time ows rightward to the outputs. This AND gate is implemented using a Tooli (CC- NOT) gate with inputs q0,q1and a single ancilla initialized to 0. At the end of the circuit, q0and q1are preserved, and the ancilla bit is set to 1 if and only if both other inputs are 1. Physical systems in classical hardware are typically binary. How- ever, in common quantum hardware, such as in superconducting and trapped ion computers, there is an innite spectrum of discrete energy levels. The qubit abstraction is an articial approximation achieved by suppressing all but the lowest two energy levels. In- stead, the hardware may be congured to manipulate the lowest three energy levels by operating on qutrits. In general, such a com- puter could be congured to operate on any number of d levels, except as d increases the number of opportunities for error, termed 2 error channels, increases. Here, we focus on d= 3with which we achieve the desired improvements to the Generalized Tooli gate. In a three level system, we consider the computational basis states |0 , |1 , and |2 for qutrits. A qutrit state |ψ may be repre- sented analogously to a qubit as |ψ=α|0+β|1+γ|2 , where α2+β2+γ2= 1. Qutrits are manipulated in a similar man- ner to qubits; however, there are additional gates which may be performed on qutrits. For instance, in quantum binary logic, there is only a single X gate. In ternary, there are three X gates denoted X01 , X02 , and X12 . Each of these Xi j for i,j can be viewed as swapping |i with |j and leaving the third basis element unchanged. For example, for a qutrit |ψ=α|0+β|1+γ|2 , applying X02 produces X02 |ψ= γ|0+β|1+α|2 . Each of these operations’ actions can be found in the left state diagram in Figure 3. There are two additional non-trivial operations on a single trit. They are the + 1and 1(sometimes referred to as a + 2) operations (with + meaning addition modulo 3). These operations can be writ- ten as X01X12 and X12X01 , respectively; however, for simplicity, we will refer to them as X+1 and X1 operations. A summary of these gates’ actions can be found in the right state diagram in Figure 3. |0 |1⟩ |2 X01 X12 X02 |0 |1⟩ |2 X1 X+1 X+1 X+1 Figure 3: The ve nontrivial permutations on the basis ele- ments for a qutrit. (Left) Each operation here switches two basis elements while leaving the third unchanged. These op- erations are self-inverses. (Right) These two operations per- mute the three basis elements by performing a + 1 mod 3 and 1 mod 3 operation. They are each other’s inverses. Other, non-classical, operations may be performed on a single qutrit. For example, the Hadamard gate [ 13 ] can be extended to work on qutrits in a similar fashion as the X gate was extended. In fact, all single qubit gates, like rotations, may be extended to operate on qutrits. In order to distinguish qubit and qutrit gates, all qutrit gates will appear with an appropriate subscript. Just as single qubit gates have qutrit analogs, the same holds for two qutrit gates. For example, consider the CNOT operation, where an X gate is performed conditioned on the control being in the |1 state. For qutrits, any of the X gates presented above may be performed, conditioned on the control being in any of the three possible basis states. Just as qubit gates are extended to take multiple controls, qutrit gates are extended similarly. The set of single qutrit gates, augmented by any entangling two-qutrit gate, is sucient for universality in ternary quantum computation [ 14 ]. One question concerning the feasibility of using higher states be- yond the standard two is whether these gates can be implemented and perform the desired manipulations. Qutrit gates have been suc- cessfully implemented [ 15 17 ] indicating it is possible to consider higher level systems apart from qubit only systems. In order to evaluate a decomposition of a quantum circuit, we consider quantum circuit costs. The space cost of a circuit, i.e. the number of qubits (or qutrits), is referred to as circuit width. Requir- ing ancilla increases the circuit width and therefore the space cost of a circuit. The time cost for a circuit is the depth of a circuit. The depth is given as the length of the critical path (in terms of gates) from input to output. 3 PRIOR WORK 3.1 Qudits Qutrits, and more generally qudits, have been been studied in past work both experimentally and theoretically. Experimentally, d as large as 10 has been achieved (including with two-qudit operations) [ 18 ], and d= 3qutrits are commonly used internally in many quantum systems [19, 20]. However, in past work, qudits have conferred only an informa- tion compression advantage. For example, N qubits can be com- pressed to N log2(d) qudits, giving only a constant-factor advantage [ 9 ] at the cost of greater errors from operating qudits instead of qubits. Under the assumption of linear cost scaling with respect to d , it has been demonstrated that d= 3is optimal [ 21 , 22 ], although as we show in Section 7 the cost is generally superlinear in d. The information compression advantage of qudits has been ap- plied specically to Grover’s search algorithm [ 23 26 ] and to Shor’s factoring algorithm [ 27 ]. Ultimately, the tradeo between informa- tion compression and higher per-qudit errors has not been favorable in past work. As such, the past research towards building practical quantum computers has focused on qubits. Our work introduces qutrit-based circuits which are asymptoti- cally better than equivalent qubit-only circuits. Unlike prior work, we demonstrate a compelling advantage in both runtime and relia- bility, thus justifying the use of qutrits. 3.2 Generalized Tooli Gate We focus on the Generalized Tooli gate, which simply adds more controls to the Tooli circuit in Figure 2. The Generalized Tooli gate is an important primitive used across a wide range of quantum algorithms, and it has been the focus of extensive past optimization work. Table 1 compares past circuit constructions for the General- ized Tooli gate to our construction, which is presented in full in Section 4.2. Among prior work, the Gidney [ 28 ], He [ 29 ], and Barenco [ 30 ] designs are all qubit-only. The three circuits have varying tradeos. While Gidney and Barenco operate at the ancilla-free frontier, they have large circuit depths: linear with a large constant for Gidney and quadratic for Barenco. The Gidney design also requires rotation gates for very small angles, which poses an experimental challenge. While the He circuit achieves logarithmic depth, it requires an ancilla for each data qubit, eectively halving the eective potential of any given quantum hardware. Nonetheless, in practice, most circuit implementations use these linear-ancilla constructions due to their small depths and gate counts. 3 This Work Gidney [28] He [29] Barenco [30] Wang [25] Lanyon [31], Ralph [32] Depth log N N log N N 2N N Ancilla 0 0 N0 0 0 Qudit Types Controls are qutrits Qubits Qubits Qubits Controls are qutrits Target is d=N-level qudit Constants Small Large Small Small Small Small Table 1: Asymptotic comparison of N-controlled gate decompositions. The total gate count for all circuits scales linearly (ex- cept for Barenco [30], which scales quadratically). Our construction uses qutrits to achieve logarithmic depth without ancilla. We benchmark our circuit construction against Gidney [28], which is the asymptotically best ancilla-free qubit circuit. As in our approach, circuit constructions from Lanyon [ 31 ], Ralph [ 32 ], and Wang [ 25 ] have attempted to improve the ancilla- free Generalized Tooli gate by using qudits. Both the Lanyon [ 31 ] and Ralph [ 32 ] constructions, which have been demonstrated ex- perimentally, achieve linear circuit depths by operating the target as a d=N -level qudit. Wang [ 25 ] also achieves a linear circuit depth but by operating each control as a qutrit. Our circuit construction, presented in Section 4.2, has similar structure to the He design, which can be represented as a binary tree of gates. However, instead of storing temporary results with a linear number of ancilla qubits, our circuit temporarily stores information directly in the qutrit |2 state of the controls. Thus, no ancilla are needed. In our simulations, we benchmark our circuit construction against the Gidney construction [ 28 ] because it is the asymptotically best qubit circuit in the ancilla-free frontier zone. We label these two benchmarks as QUTRIT and QUBIT. The QUBIT circuit handles the lack of ancilla by using dirty ancilla, which unlike clean (initialized to |0 ) ancilla, can have an unknown initial state. Dirty ancilla can therefore be bootstrapped internally from a quantum circuit. How- ever, this technique requires a large number of Tooli gates which makes the decomposition particularly expensive in gate count. Augmenting the base Gidney construction with a single an- cilla 1 does reduce the constants for the decomposition signicantly, although the asymptotic depth and gate counts are maintained. For completeness, we also benchmark our circuit against this aug- mented construction, QUBIT+ANCILLA. However, the augmented circuit does not operate at the ancilla-free frontier, and it conicts with parallelism, as discussed in Section 9. 4 CIRCUIT CONSTRUCTION In order for quantum circuits to be executable on hardware, they are typically decomposed into single- and two- qudit gates. Performing ecient low depth and low gate count decompositions is important in both the NISQ regime and beyond. Our circuits assume all-to-all connectivity–we discuss this assumption in Section 9. 4.1 Key Intuition To develop intuition for our technique, we rst present a Tooli gate decomposition which lays the foundation for our generalization to multiple controls. In each of the following constructions, all inputs and outputs are qubits, but we may occupy the |2 state temporarily during computation. Maintaining binary input and 1This ancilla can also also be dirty. |q01 1 |q1X+12X1 |q2X Figure 4: A Tooli decomposition via qutrits. Each input and output is a qubit. The red controls activate on |1and the blue controls activate on |2. The rst gate temporarily ele- vates q1to |2if both q0and q1were |1. We then perform the X operation only if q1is |2. The nal gate restores q0and q1 to their original state. output allows these circuit constructions to be inserted into any preexisting qubit-only circuits. In Figure 4, a Tooli decomposition using qutrits is given. A similar construction for the Tooli gate is known from past work [ 31 , 32 ]. The goal is to perform an X operation on the last (target) input qubit q2 if and only if the two control qubits, q0 and q1 , are both |1 . First a |1 -controlled X+1 is performed on q0 and q1 . This elevates q1 to |2 i q0 and q1 were both |1 . Then a |2 -controlled X gate is applied to q2 . Therefore, X is performed only when both q0 and q1 were |1 , as desired. The controls are restored to their original states by a |1 -controlled X1 gate, which undoes the eect of the rst gate. The key intuition in this decomposition is that the qutrit |2 state can be used instead of ancilla to store temporary information. 4.2 Generalized Tooli Gate We now present our circuit decomposition for the Generalized Tooli gate in Figure 5. The decomposition is expressed in terms of three-qutrit gates (two controls, one target) instead of single- and two- qutrit gates, because the circuit can be understood purely classically at this granularity. In actual implementation and in our simulation, we used a decomposition [ 15 ] that requires 6 two-qutrit and 7 single-qutrit physically implementable quantum gates. Our circuit decomposition is most intuitively understood by treating the left half of the the circuit as a tree. The desired property is that the root of the tree, q7 , is |2 if and only if each of the 15 controls was originally in the |1 state. To verify this property, we observe the root q7 can only become |2 i q7 was originally |1 and q3 and q11 were both previously |2 . At the next level of the tree, we see q3 could have only been |2 if q3 was originally |1 and both q1 and q5 were previously |2 , and similarly for the other triplets. At the bottom level of the tree, the triplets are controlled on the 4 |q01 1 |q1X+12 2 X1 |q21 1 |q3X+12 2 X1 |q41 1 |q5X+12 2 X1 |q61 1 |q7X+12X1 |q81 1 |q9X+12 2 X1 |q101 1 |q11X+12 2 X1 |q121 1 |q13X+12 2 X1 |q141 1 |q15U Figure 5: Our circuit decomposition for the Generalized Tof- foli gate is shown for 15 controls and 1 target. The inputs and outputs are both qubits, but we allow occupation of the |2qutrit state in between. The circuit has a tree structure and maintains the property that the root of each subtree can only be elevated to |2if all of its control leaves were |1. Thus, the Ugate is only executed if all controls are |1. The right half of the circuit performs uncomputation to restore the controls to their original state. This construction applies more generally to any multiply-controlled Ugate. Note that the three-input gates are decomposed into 6 two-input and 7 single-input gates in our actual simulation, as based on the decomposition in [15]. |1 state, which are only activated when the even-index controls are all |1 . Thus, if any of the controls were not |1 , the |2 states would fail to propagate to the root of the tree. The right half of the circuit performs uncomputation to restore the controls to their original state. After each subsequent level of the tree structure, the number of qubits under consideration is reduced by a factor of 2. Thus, the circuit depth is logarithmic in N . Moreover, each qutrit is operated on by a constant number of gates, so the total number of gates is linear in N. Our circuit decomposition still works in a straightforward fash- ion when the control type of the top qubit, q0 , activates on |2 or |0 instead of activating on |1 . These two constructions are necessary for the Incrementer circuit in 5.3. We veried our circuits, both formally and via simulation. Our verication scripts are available on our GitHub [33]. 5 APPLICATION TO ALGORITHMS The Generalized Tooli gate is an important primitive in a broad range of quantum algorithms. In this section, we survey some of the applications of our circuit decomposition. 5.1 Articial Quantum Neuron The articial quantum neuron [ 34 ] is a promising target application for our circuit construction, because the algorithm’s circuit imple- mentation is dominated by large Generalized Tooli gates. The algorithm may exhibit an exponential advantage over classical per- ceptron encoding and it has already been executed on current quan- tum hardware. Moreover, the threshold behavior of perceptrons has inherent noise resilience, which makes the articial quantum neuron particularly promising as a near-term application on noisy systems. The current implementation of the neuron on IBM quan- tum computers relies on ancilla qubits [ 35 ] which constrains the circuit width to N= 4data qubits. Our circuit construction oers a path to larger circuit sizes without waiting for larger hardware. 5.2 Grover’s Algorithm Oracle H X 1X H H X 1X H H X 1X H H X Z X H Figure 6: Each iteration of Grover Search has a multiply- controlled Zgate. Our logarithmic depth decomposition, re- duces a log Mfactor in Grover’s algorithm to log log M. Grover’s Algorithm for search over M unordered items requires just O(M) oracle queries. However, each oracle query is followed by a post-processing step which requires a multiply-controlled gate with N=log2M controls [ 13 ]. The explicit circuit diagram is shown in Figure 6. Our log-depth circuit construction directly applies to the multiply- controlled gate. Thus, we reduce a log M factor in Grover search’s time complexity to log log M via our ancilla-free qutrit decomposi- tion. 5.3 Incrementer The Incrementer circuit performs the + 1 mod 2 N operation to a register of N qubits. While logarithmic circuit depth can be achieved with linear ancilla qubits [ 36 ], the best ancilla-free incrementers require either linear depth with large linearity constants [ 37 ] or quadratic depth [ 30 ]. Using alternate control activations for our Generalized Tooli gate decomposition, the incrementer circuit is reduced to O(log2N)depth with no ancilla, a signicant improve- ment over past work. Our incrementer circuit construction is shown in Figure 7 for an N= 8wide register. The multiple-controlled X+1 gates perform the 5 job of computing carries: a carry is performed i the least signicant bit generates (represented by the |2 control) and all subsequent bits propagate (represented by the consecutive |1 controls). We present an N= 8incrementer here and have veried the general construction, both by formal proof and by explicit circuit simulation for larger N. The critical path of this circuit is the chain of log N multiply- controlled gates (of width N 2 , N 4 , N 8 , ...) which act on |a0 . Since our multiply-controlled gate decomposition has log-depth, we arrive at a total circuit depth circuit scaling of log2N. |a0X+12 2 2 2 2 X02 |(a+1)0 |a11 1 X01 0 0 |(a+1)1 |a21X+12X02 0|(a+1)2 |a31X01 0|(a+1)3 |a4X+12 2 2 X02 |(a+1)4 |a51X01 0|(a+1)5 |a6X+12X02 |(a+1)6 |a7X01 |(a+1)7 Figure 7: Our circuit decomposition for the Incrementer. At each subcircuit in the recursive design, multiply-controlled gates are used to eciently propagate carries over half of the subcircuit. The |2control checks for carry generation and the chain of |1controls checks for carry propagation. The circuit depth is log2N, which is only possible because of our log depth multiply-controlled gate primitive. 5.4 Arithmetic Circuits and Shor’s Algorithm The Incrementer circuit is a key subcircuit in many other arithmetic circuits such as constant addition, modular multiplication, and mod- ular exponentiation. Further, the modular exponentiation circuit is the bottleneck in the runtime for executing Shor’s algorithm for factorization [ 37 , 38 ]. While a shallower Incrementer circuit alone is not sucient to reduce the asymptotic cost of modular exponen- tiation (and therefore Shor’s algorithm), it does reduce constants relative to qubit-only circuits. 5.5 Error Correction and Fault Tolerance The Generalized Tooli gate has applications to circuits for both error correction [ 39 ] and fault tolerance [ 40 ]. We foresee two paths of applying these circuits. First, our circuit construction can be used to construct error-resilient logical qubits more eciently. This is critical for quantum algorithms like Grover’s and Shor’s which are expected to require such logical qubits. In the nearer-term, NISQ algorithms are likely to make use of limited error correction. For instance, recent results have demonstrated that error correcting a single qubit at a time for the Variational Quantum Eigensolver algorithm can signicantly reduce total error [ 41 ]. Thus, our circuit construction is also relevant for NISQ-era error correction. 6 SIMULATOR To simulate our circuit constructions, we developed a qudit simu- lation library, built on Google’s Cirq Python library [ 12 ]. Cirq is a qubit-based quantum circuit library and includes a number of useful abstractions for quantum states, gates, circuits, and scheduling. Our work extends Cirq by discarding the assumption of two-level qubit states. Instead, all state vectors and gate matrices are expanded to apply to d -level qudits, where d is a circuit parameter. We include a library of common gates for d= 3qutrits. Our software adds a comprehensive noise simulator, detailed below in Section 6.1. In order to verify our circuits are logically correct, we rst simu- lated them with noise disabled. We extended Cirq to allow gates to specify their action on classical non-superposition input states without considering full state vectors. Therefore, each classical input state can be veried in space and time proportional to the circuit width. By contrast, Cirq’s default simulation procedure relies on a dense state vector representation requiring space and time exponential in the circuit width. Reducing this scaling from expo- nential to linear dramatically improved our verication procedure, allowing us to verify circuit constructions for all possible classical inputs across circuit sizes up to widths of 14. Our software is fully open source [33]. 6.1 Noise Simulation Figure 8 depicts a schematic view of our noise simulation procedure which accounts for both gate errors and idle errors, described below. To determine when to apply each gate and idle error, we use Cirq’s scheduler which schedules each gate as early as possible, creating a sequence of Moment ’s of simultaneous gates. During each Moment , our noise simulator applies a gate error to every qudit acted on. Finally, the simulator applies an idle error to every qudit. This noise simulation methodology is consistent with previous simulation techniques which have accounted for either gate errors [ 42 ] or idle errors [43]. U1U1Gate Error Idle Error Idle Error U2=U2Gate Error Idle Error U3U3Gate Error Idle Error Idle Error Figure 8: This Moment comprises three gates executed in par- allel. To simulate with noise, we rst apply the ideal gates, followed by a gate error noise channel on each aected qu- dit. This gate error noise channel depends on whether the corresponding gate was single- or two- qudit. Finally, we ap- ply an idle error to every qudit. The idle error noise channel depends on the duration of the Moment. Gate errors arise from the imperfect application of quantum gates. Two-qudit gates are noisier than single-qudit gates [ 44 ], so we apply dierent noise channels for the two. Our specic gate error probabilities are given in Section 7. 6 Idle errors arise from the continuous decoherence of a quantum system due to energy relaxation and interaction with the environ- ment. The idle errors dier from gate errors in two ways which require special treatment: (1) Idle errors depend on duration, which in turn depend on the schedule of simultaneous gates ( Moment s). In particular, two-qudit gates take longer to apply than single-qudit gates. Thus, if a Moment contains a two-qudit gate, the idling errors must be scaled appropriately. Our specic scaling factors are given in Section 7. (2) For the generic model of gate errors, the error channel is applied with probability independent of the quantum state. This is not true for idle errors such as T1 amplitude damping, which only applies when the qudit is in an excited state. This is treated in the simulator by computing idle error probabili- ties during each Moment, for each qutrit. Gate errors are reduced by performing fewer total gates, and idle errors are reduced by decreasing the circuit depth. Since our circuit constructions asymptotically decrease the depth, this means our circuit constructions scale favorably in terms of asymptotically fewer idle errors. Our full noise simulation procedure is summarized in Algo- rithm 1. The ultimate metric of interest is the mean delity, which is dened as the squared overlap between the ideal (noise-free) and actual output state vectors. Fidelity expresses the probability of overall successful execution. We do not consider initialization er- rors and readout errors, because our circuit constructions maintain binary input and output, only occupying the qutrit |2 states during intermediate computation. Therefore, the initialization and readout errors for our circuits are identical to those for conventional qubit circuits. We also do not consider crosstalk errors, which occur when gates are executed in parallel. The eect of crosstalk is very device- dependent and dicult to generalize. Moreover, crosstalk can be mitigated by breaking each Moment into a small number of sub- moments and then scheduling two-qutrit operations to reduce crosstalk, as demonstrated in prior work [45, 46]. 6.2 Simulator Eciency Simulating a quantum circuit with a classical computer is, in general, exponentially dicult in the size of the input because the state of N qudits is represented by a state vector of dN complex numbers. For 14 qutrits, with complex numbers stored as two 8-byte oats (complex128 in NumPy), a state vector occupies 77 megabytes. A naive circuit simulation implementation would treat every quantum gate or Moment as a dN×dN matrix. For 14 qutrits, a single such matrix would occupy 366 terabytes–out of range of simulability. While the exponential nature of simulating our circuits is unavoidable, we mitigate the cost by using a variety of techniques which rely only on state vectors, rather than full square matrices. For example, we maintain Cirq’s approach of applying gates by Einstein Summation [ 47 ], which obviates computation of the dN× dNmatrix corresponding to every gate or Moment. Our noise simulator only relies on state vectors, by adopting the quantum trajectory methodology [ 48 , 49 ], which is also used by the Rigetti PyQuil noise simulator [ 50 ]. At a high level, the eect of |Ψ⟩ ← random initial state vector |Ψideal =circuit applied to |Ψwithout noise foreach Moment do foreach Gate Moment do |ψ⟩ ← Gate applied to |ψ GateError DrawRand(GateError Prob.) |ψ⟩ ← GateError applied to |ψ end foreach Qutrit do if Moment has 2-qudit gate then IdleErrors long-duration idle errors else IdleErrors short-duration idle errors end Prob. ← [∥M|Ψ⟩ ∥2for MIdleErrors] IdleError DrawRand(Prob.) |ψ⟩ ← IdleError applied to |ψ Renormalize(|ψ) end end return Ψideal|Ψ2,delity between ideal & actual output; Algorithm 1: Pseudocode for each simulation trial, given a particular circuit and noise model. noise channels like gate and idle errors is to turn a coherent quan- tum state into an incoherent mix of classical probability-weighted quantum states (for example, |0 and |1 with 50% probability each). The most complete description of such an incoherent quantum state is called the density matrix and has dimension dN×dN . The quantum trajectory methodology is a stochastic approach–instead of maintaining a density matrix, only a single state is propagated and the error term is drawn randomly at each timestep. Over re- peated trials, the quantum trajectory methodology converges to the same results as from full density matrix simulation [ 50 ]. Our simulator employs this technique–each simulation in Algorithm 1 constitutes a single quantum trajectory trial. At every step, a spe- cic GateError or IdleError term is picked, based on a weighted random draw. Finally, our random state vector generation function was also implemented in O(dN) space and time. This is an improvement over other open source libraries [ 51 , 52 ], which perform random state vector generation by generating full dN×dN unitary matrices from a Haar-random distribution and then truncating to a single column. Our simulator directly computes the rst column and circumvents the full matrix computation. With optimizations, our simulator is able to simulate circuits up to 14 qutrits in width. This is in the range as other state-of-the-art noisy quantum circuit simulations [ 53 ] (since 14 qutrits 22 qubits). While each simulation trial took several minutes (depending on the particular circuit and noise model), we were able to run trials in parallel over multiple processes and multiple machines, as described in Section 8. 7 7 NOISE MODELS In this section, we describe our noise models at a high level, with mathematical details described in Appendix A. We chose noise models which represent realistic near-term machines. We rst present a generic, parametrized noise model roughly applicable to all quantum systems. We then present specic parameters, under the generic noise model, which apply to near-term superconducting quantum computers. Finally, we present a specic noise model for trapped ion quantum computers. 7.1 Generic Noise Model 7.1.1 Gate Errors. The scaling of gate errors for a d -level qudit can be roughly summarized as increasing as d4 for two-qudit gates and d2 for single-qudit gates. For d= 2, there are 4 single-qubit gate error channels and 16 two-qubit gate error channels. For d= 3there are 9 and 81 single- and two- qutrit gate error channels respectively. Consistent with other simulators [ 43 , 50 ], we use the symmetric depolarizing gate error model, which assumes equal probabilities between each error channel. Under these noise models, two-qutrit gates are ( 1 80 p2)/( 1 15 p2) times less reliable than two-qubit gates, where p2 is the probability of each two-qubit gate error channel. Similarly, single-qutrit gates are ( 1 8 p1)/( 1 3 p1) times less reliable than single-qubit gates, where p1 is the probability of each single-qubit gate error channel. 7.1.2 Idle Errors. Our treatment of idle errors focuses on the re- laxation from higher to lower energy states in quantum devices. This is called amplitude damping or T1 relaxation. This noise chan- nel irreversibly takes qudits to lower states. For qubits, the only amplitude damping channel is from |1 to |0 , and we denote this damping probability as λ1 . For qutrits, we also model damping from |2to |0, which occurs with probability λ2. 7.2 Superconducting QC We chose four noise models based on superconducting quantum computers expected in the next few years. These noise models com- ply with the generic noise model above and are thus parametrized by p1 , p2 , λ1 and λ2 . The λi probabilities are derived from two other experimental parameters: the gate time t and T1 , a timescale that captures how long a qudit persists coherently. As a starting point for representative near-term noise models, we consider parameters for current superconducting quantum com- puters. For IBM’s public cloud-accessible superconducting quantum computers, we have 3 p1 10 3 and 15 p2 10 2 . The duration of single- and two- qubit gates is t 100 ns and t 300 ns respectively, and the IBM devices haveT1100µs[44, 54]. However, simulation for these current parameters indicates an error is almost certain to occur during execution of a modest size 14-input Generalized Tooli circuit. This motivates us to instead consider noise models for better devices which are a few years away. Accordingly, we adopt a baseline superconducting noise model, labeled as SC, corresponding to a superconducting device which has 10x lower gate errors and 10x longer T1 duration than the current IBM hardware. This range of parameters has already been achieved experimentally in superconducting devices for gate errors [ 55 , 56 ] and for T1 duration [ 57 , 58 ] independently. Faster gates Noise Model 3p115p2T1 SC 1041031 ms SC+T1 10410310 ms SC+GATES 1051041 ms SC+T1+GATES 10510410 ms Table 2: Noise models simulated for superconducting de- vices. Current publicly accessible IBM superconducting quantum computers have single- and two- qubit gate errors of 3 p1 10 3and 15 p2 10 2, as well as T1lifetimes of 0.1 ms [44, 54]. Our baseline benchmark, SC, assumes 10x better gate errors andT1. The other three benchmarks add a further 10x improvement to T1, gate errors, or both. (shorter t ) are yet another path towards greater noise resilience. We do not vary gate speeds, because errors only depend on the t/T1 ratio, and we already vary T1 . In practice however, faster gates could also improve noise-resilience. We also consider three additional near-term device noise models, indexed to the SC noise model. These three models further improve gate errors, T1 , or both, by a 10x factor. The specic parameters are given in Table 2. Our 10x improvement projections are realistic extrapolations of progress in hardware. In particular, Schoelkopf’s Law–the quantum analogue of Moore’s Law–has observed that T1 durations have increased by 10x every 3 years for the past 20 years [ 59 ]. Hence, 100x longer T1 is a reasonable projection for devices that are 6years away. 7.3 Trapped Ion 171Yb+QC We also simulated noise models for trapped ion quantum computing devices. Trapped ion devices are well matched to our qutrit-based circuit constructions because they feature all-to-all connectivity [ 60 ], and many ions that are ideal candidates for QC devices are naturally multi-level systems. We focus on the 171 Yb + ion, which has been experimentally demonstrated as both a qubit and qutrit [ 10 , 11 ]. Trapped ions are often favored in QC schemes due to their long T1 times. One of the main advantages of using a trapped ion is the ability to take advantage of magnetically insensitive states known as "clock states." By dening the computational subspace on these clock states, idle errors caused from uctuations in the magnetic eld are minimized–this is termed a DRESSED_QUTRIT, in contrast with a BARE_QUTRIT. However, compared to superconducting devices, gates are much slower. Thus, gate errors are the dominant error source for ion trap devices. We modelled a fundamental source of these errors: the spontaneous scattering of photons originating from the lasers used to drive the gates. The duration of single- and two- qubit gates used in this calculation was t 1 µ s and t 200 µ s respectively [ 61 ]. The single- and two- qudit gate error probabilities are given in Table 3. 8 RESULTS Figure 9 plots the exact circuit depths for all three benchmarked circuits. The qubit-based circuit constructions from past work are linear in depth and have a high linearity constant. Augmenting 8 Noise Model p1p2 TI_QUBIT 6.4×1041.3×104 BARE_QUTRIT 2.2×1044.3×104 DRESSED_QUTRIT 1.5×1043.1×104 Table 3: Noise models simulated for trapped ion devices. The single- and two- qutrit gate error channel probabilities are based on calculations from experimental parameters. For all three models, we use single- and two- qudit gate times of t1µsand t200 µsrespectively. with a single borrowed ancilla reduces the circuit depth by a factor of 8. However, both circuit constructions are surpassed signicantly by our qutrit construction, which scales logarithmically in N and has a relatively small leading coecient. 25 50 75 100 125 150 175 200 101 102 103 104 105633N 76N 38 log2(N) Number of Qudits Circuit Depth QUBIT QUBIT+ANCILLA QUTRIT Figure 9: Exact circuit depths for all three benchmarked cir- cuit constructions for the N-controlled Generalized Tooli up to N= 200 . Both QUBIT and QUBIT+ANCILLA scale lin- early in depth and both are bested by QUTRIT’s logarithmic depth. Figure 10 plots the total number of two-qudit gates for all three circuit constructions. As noted in Section 4, our circuit construction is not asymptotically better in total gate count–all three plots have linear scaling. However, as emphasized by the logarithmic vertical axis, the linearity constant for our qutrit circuit is 70x smaller than for the equivalent ancilla-free qubit circuit and 8x smaller than for the borrowed-ancilla qubit circuit. Our simulations under realistic noise models were run in parallel on over 100 n1-standard-4 Google Cloud instances. These simu- lations represent over 20,000 CPU hours, which was sucient to estimate mean delity to an error of 2 σ< 0 . 1% for each circuit- noise model pair. The full results of our circuit simulations are shown in Figure 11. All simulations are for the 14-input (13 controls, 1 target) General- ized Tooli gate. We simulated each of the three circuit benchmarks against each of our noise models (when applicable), yielding the 16 bars in the gure. 25 50 75 100 125 150 175 200 101 102 103 104 105 397N 48N 6N Number of Qudits Two-Qudit Gate Count QUBIT QUBIT+ANCILLA QUTRIT Figure 10: Exact two-qudit gate counts for the three bench- marked circuit constructions for the N-controlled Gener- alized Tooli. All three plots scale linearly; however the QUTRIT construction has a substantially lower linearity constant. 9 DISCUSSION Figure 11 demonstrates that our QUTRIT construction (orange bars) signicantly outperforms the ancilla-free QUBIT benchmark (blue bars) in delity (success probability) by more than 10,000x. For the SC,SC+T1, and SC+GATES noise models, our qutrit constructions achieve between 57-83% mean delity, whereas the ancilla-free qubit constructions all have almost 0% delity. Only the lowest-error model, SC+T1+GATES achieves modest delity of 26% for the QUBIT circuit, but in this regime, the qutrit circuit is close to 100% delity. The trapped ion noise models achieve similar results–the DRESSED_QUTRIT and the BARE_QUTRIT achieve approximately 95% delity via the QUTRIT circuit, whereas the TI_QUBIT noise model has only 45% delity. Between the dressed and bare qutrits, the dressed qutrit exhibits higher delity than the bare qutrit, as ex- pected. Moreover, as discussed in Appendix A.3, the dressed qutrit is resilient to leakage errors, so the simulation results should be viewed as a lower bound on its advantage over the qubit and bare qutrit. Based on these results, trapped ion qutrits are a particularly strong match to our qutrit circuits. In addition to attaining the high- est delities, trapped ions generally have all-to-all connectivity [ 60 ] within each ion chain, which is critical as our circuit construction requires operations between distant qutrits. The superconducting noise models also achieve good delities. They exhibit a particularly large advantage over ancilla-free qubit constructions because idle errors are signicant for superconduct- ing systems, and our qutrit construction signicantly reduces idling (circuit depth). However, most superconducting quantum systems only feature nearest-neighbor or short-range connectivity. Account- ing for data movement on a nearest-neighbor-connectivity 2D ar- chitecture would expand the qutrit circuit depth from log N to N (since the distance between any two qutrits would scale as 9 SC SC+T1 SC+GATES SC+T1+GATES 0% 25% 50% 75% 100% 0.01%0.56%0.01% 26.1% 18.5% 52.3% 30.2% 84.1% 56.8%65.9% 83.1% 94.7% Fidelity for Superconducting Models QUBIT QUBIT+ANCILLA QUTRIT TI_QUBIT 44.7% 89.9% Fidelity for Trapped Ion Models BARE_QUTRIT 94.9% DRESSED_QUTRIT 96.1% Figure 11: Circuit simulation results for all possible pairs of circuit constructions and noise models. Each bar represents 1000+ trials, so the error bars are all 2 σ< 0 . 1% . Our QUTRIT construction signicantly outperforms the QUBIT construction. The QUBIT+ANCILLA bars are drawn with dashed lines to emphasize that it has access to an extra ancilla bit, unlike our construc- tion. N ). However, recent work has experimentally demonstrated fully- connected superconducting quantum systems via random access memory [ 62 ]. Such systems would also be well matched to our circuit construction. For completeness, Figure 11 also shows delities for the QUBIT+ANCILLA circuit benchmark, which augments the ancilla- free QUBIT circuit with a single dirty ancilla. Since QUBIT+ANCILLA has linearity constants 10x better than the ancilla-free qubit cir- cuit, it exhibits signicantly better delities. While our QUTRIT circuit still outperforms the QUBIT+ANCILLA circuit, we expect a crossing point where augmenting a qubit-only Generalized Tof- foli with enough ancilla would eventually outperform QUTRIT. However, we emphasize that the gap between an ancilla-free and constant-ancilla construction for the Generalized Tooli is actually a fundamental rather than an incremental gap, because: Constant-ancilla constructions prevent circuit paralleliza- tion. For example, consider the parallel execution of N/k disjoint Generalized Tooli gates, each of width kfor some constant k . An ancilla-free Generalized Tooli would pose no issues, but an ancilla-augmented Generalized Tooli would require Θ(N/k) ancilla. Thus, constant-ancilla constructions can impose a choice between serializing to linear depth or regressing to linear ancilla count. The Incrementer circuit in Figure 7 is a concrete example of this scenario–any multiply- controlled gate decomposition requiring a single clean ancilla or more than 1 dirty ancilla would contradict the parallelism and reduce runtime. Even if we only consider serial circuits, given the exponential advantage of certain quantum algorithms, there is a signif- icant practical dierence between operating at the ancilla- free frontier and operating just a few data qubits below the frontier. While we only performed simulations up to 14 inputs in width, we would see an even bigger advantage in larger circuits because our construction has asymptotically lower depth and therefore asymptotically lower idle errors. We also expect to see an advantage for the circuits in Section 5 that rely on the Generalized Tooli, although we did not explicitly simulate these circuits. Our circuit construction and simulation results point towards promising directions of future work that we highlight below: A number of useful quantum circuits, especially arithmetic circuits, make extensive use of multiply-controlled gates. However, these circuits are typically pre-compiled into single- and two- qubit gates using one of the decompositions from prior work, usually one that involves ancilla qubits. Revisit- ing these arithmetic circuits from rst principles, with our qutrit circuit as a new tool, could yield novel and improved circuits like our Incrementer circuit in Section 5.3. Relatedly, we see value in a logic synthesis tool that injects qutrit optimizations into qubit circuits, automated in fashion inspired by classical reversible logical synthesis tools [ 63 , 64 ]. While d= 3qutrits were sucient to achieve the desired asymptotic speedups for our circuits of interest, there may be other circuits that are optimized by qudit information carriers for larger d . In particular, we note that increasing d and thereby increasing information compression may be advantageous for hardware with limited connectivity. Independent of these future directions, the results presented in this work are applicable to quantum computing in the near term, on machines that are expected within the next ve years. The net result of this work is to extend the frontier of what is computable by quantum hardware, and hence to accelerate the timeline for practical quantum computing, rather than waiting for better hardware. Emphatically, our results are driven by the use of qutrits for asymptotically faster ancilla-free circuits. Moreover, we also improve linearity constants by two orders of magnitudes. Finally, as veried by our open-source circuit simulator coupled with realistic noise models, our circuits are more reliable than qubit- only equivalents. Our results justify the use of qutrits as a path towards scaling quantum computers. 10 ACKNOWLEDGEMENTS We would like to thank Michel Devoret and Steven Girvin for sug- gesting that we investigate qutrits. We also acknowledge David Schuster for helpful discussion on superconducting qutrits. This work is funded in part by EPiQC, an NSF Expedition in Computing, under grants CCF-1730449/1832377, and in part by STAQ, under grant NSF Phy-1818914. A DETAILED NOISE MODEL We chose noise models that represent realistic near-term machines. We rst present a generic, parametrized noise model in that is roughly applicable to all quantum systems. Next, we present specic parameters, under the generic noise model, that apply to near-term superconducting quantum computers. Finally, we present a specic noise model for 171Yb+trapped ions. A.1 Generic Noise Model The general form of a quantum noise model is expressed by the Kraus Operator formalism which species a set of matrices, {Ki} , each capturing an error channel. Under this formalism, the evo- lution of a system with initial state σ=|Ψ⟩ ⟨Ψ| is expressed as a function E(σ), where: E(σ)=E|Ψ⟩ ⟨Ψ|=Õ i KiσK i(1) where denotes the matrix conjugate-transpose. A.1.1 Gate Errors. For a single qubit, there are four possible error channels: no-error, bit ip, phase ip, and phase+bit ip. These channels can be expressed as products of the Pauli matrices: X= 0 1 1 0!and Z= 1 0 01! which correspond to bit and phase ips respectively. The no-error channel is X0Z0=I and the phase+bit ip channel is the product X1Z1. In the Kraus operator formalism, we express this single-qubit gate error model as E(σ)= 1 Õ j=0 1 Õ k=0 pjk (XjZk)σ(XjZk)(2) where pjk denotes the probability of the corresponding Kraus op- erator. This gate error model is called the Pauli or depolarizing channel. We assume all error terms have equal probabilities, i.e. pjk =p1 for j,k, 0. This assumption of symmetric depolarizing is standard and is used by most noise simulators [ 43 ]. Under this model, the error channel simplies to: E(σ)=(13p1)σ+Õ jk {0,1}2\00 p1(XjZk)σ(XjZk)(3) For two-qubit gate errors, the Kraus operators are the Cartesian product of the two single-qubit gate error Kraus operators, leading to the noise channel: E(σ)=(115p2)σ+Õ jk lm {0,1}4\0000 p2Kjk l mσK jk l m (4) where p2 is the probability of each error term and Kjk l m =XjZk XlZm. Next, for qutrits, we have a similar form, except that there are now more possible error channels. We now use the generalized Pauli matrices: X+1=©« 001 100 010ª®®¬ and Z3=©« 1 0 0 0e2πi/30 0 0 e4πi/3ª®®¬ The Cartesian product of {I,X+1,X2 +1} and {I,Z3,Z2 3} constitutes a basis for all 3x3 matrices. Hence, this Cartesian product also constitutes the Kraus operators for the single-qutrit gate error [42, 65, 66]: E(σ)=(18p1)σ+Õ jk {0,1,2}2\00 p1(Xj +1Zk 3)σ(Xj +1Zk 3)(5) and similarly, the two-qutrit gate error channel is: E(σ)=(180p2)σ+Õ jk lm {0,1,2}4\0000 p2Kjk l mσK jk l m (6) Note that in this model, the dominant eect of using qutrits instead of qubits is that the no-error probability for two-operand gates diminishes from 1 15 p2 to 1 80 p2 , as expressed by equations 4 and 6 respectively. A.1.2 Idle Errors. For qubits, the Kraus operators for amplitude damping are: K0= 1 0 01λ1!and K1= 0λ1 0 0 !(7) For qutrits, the Kraus operator for amplitude damping can be modeled as [66, 67]: K0=©« 1 0 0 01λ10 0 0 1λ2ª®®¬ ,K1=©« 0λ10 0 0 0 0 0 0ª®®¬ , and K2=©« 0 0 λ2 0 0 0 0 0 0 ª®®¬ (8) As discussed in Section 6.1, these noise channels are incoherent (non-unitary), which means that the probability of each error oc- curring depends on the current state. Specically, the probability of the Kichannel aecting the state |Ψis Ki|ψ⟩ ∥2[13]. A.2 Superconducting QC We picked four noise models based on superconducting quantum computers that are expected in the next few years. These noise models comply with the generic noise model above and are thus parametrized by p1 , p2 , λ1 , and λ2 . The λm terms are given by [ 67 ]: λm=1emt/T1(9) where t is the duration of the idling and T1 is associated with the lifetime of each qubit. 11 A.3 Trapped Ion 171Yb+QC Based on calculations from experimental parameters for the trapped ion qutrit, we know the specic Kraus operator types for the error terms, which deviate slightly from those in the generic error model. The specic Kraus operator matrices are provided at our GitHub repository [33]. We chose three noise models: TI_QUBIT,BARE_QUTRIT, and DRESSED_QUTRIT. Both TI_QUBIT and DRESSED_QUTRIT take advantage of clock states and thus have very small idle errors. They both would be ideal candidates for a qudit. The BARE_QUTRIT will suer more from idle errors as it is not strictly dened on clock states but will require less experimental resources to prepare. Idle errors are very small in magnitude and manifest as coherent phase errors rather than amplitude damping errors as modeled in Section 7.1.2. We also do not consider leakage errors. These errors could be handled for Yb + by treating each ion as a d= 4qudit, regardless of whether we use it as a qubit or a qutrit. REFERENCES [1] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, “Quan- tum machine learning,Nature, vol. 549, pp. 195 EP –, Sep 2016. [2] I. Kassal, J. D. Whiteld, A. Perdomo-Ortiz, M.-H. Yung, and A. Aspuru-Guzik, “Simulating chemistry using quantum computers,” Annual review of physical chemistry, vol. 62, pp. 185–207, 2011. [3] P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer,SIAM Journal on Computing, vol. 26, pp. 1484– 1509, Oct. 1997. [4] L. K. Grover, “A fast quantum mechanical algorithm for database search,” in Annual ACM Symposium on Theory of Computing, pp. 212–219, ACM, 1996. [5] J. Preskill, “Quantum Computing in the NISQ era and beyond,Quantum, vol. 2, p. 79, Aug. 2018. [6] Y. Ding, A. Holmes, A. Javadi-Abhari, D. Franklin, M. Martonosi, and F. Chong, “Magic-state functional units: Mapping and scheduling multi-level distillation circuits for fault-tolerant quantum architectures,” in 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 828–840, IEEE, 2018. [7] A. Javadi-Abhari, P. Gokhale, A. Holmes, D. Franklin, K. R. Brown, M. Martonosi, and F. T. Chong, “Optimized surface code communication in superconducting quantum computers,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 ’17, (New York, NY, USA), pp. 692– 705, ACM, 2017. [8] G. G. Guerreschi and J. Park, “Two-step approach to scheduling quantum circuits, Quantum Science and Technology, vol. 3, p. 045003, jul 2018. [9] A. Pavlidis and E. Floratos, “Arithmetic circuits for multilevel qudits based on quantum fourier transform,arXiv preprint arXiv:1707.08834, 2017. [10] J. Randall, S. Weidt,E. D. Standing, K. Lake, S. C. Webster,D. F. Murgia, T. Navickas, K. Roth, and W. K. Hensinger, “Ecient preparation and detection of microwave dressed-state qubits and qutrits with trapped ions,Phys. Rev. A, vol. 91, p. 012322, 01 2015. [11] J. Randall, A. M. Lawrence, S. C. Webster, S. Weidt, N. V. Vitanov, and W. K. Hensinger, “Generation of high-delity quantum control methods for multilevel systems,Phys. Rev. A, vol. 98, p. 043414, 10 2018. [12] “Cirq: A python framework for creating, editing, and invoking noisy intermediate scale quantum (NISQ) circuits.” https://github.com/quantumlib/Cirq, 2018. [13] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information: 10th Anniversary Edition. New York, NY, USA: Cambridge University Press, 10th ed., 2011. [14] J.-L. Brylinski and R. Brylinski, “Universal quantum gates,” in Mathematics of quantum computation, pp. 117–134, Chapman and Hall/CRC, 2002. [15] Y.-M. Di and H.-R. Wei, “Elementary gates for ternary quantum logic circuit,” arXiv preprint arXiv:1105.5485, 2011. [16] A. Muthukrishnan and C. R. Stroud, “Multivalued logic gates for quantum com- putation,Phys. Rev. A, vol. 62, p. 052309, Oct 2000. [17] A. B. Klimov, R. Guzmán, J. C. Retamal, and C. Saavedra, “Qutrit quantum com- puter with trapped ions,Phys. Rev. A, vol. 67, p. 062313, Jun 2003. [18] M. Kues, C. Reimer, P. Roztocki, L. R. Cortés, S. Sciara, B. Wetzel, Y. Zhang, A. Cino, S. T. Chu, B. E. Little, D. J. Moss, L. Caspani, J. Azaña, and R. Morandotti, “On-chip generation of high-dimensional entangled quantum states and their coherent control,Nature, vol. 546, pp. 622 EP –, 06 2017. [19] T. BÃękkegaard, L. B. Kristensen, N. J. S. Loft, C. K. Andersen, D. Petrosyan, and N. T. Zinner, “Superconducting qutrit-qubit circuit: A toolbox for ecient quantum gates,arXiv preprint arXiv:1802.04299, 2018. [20] A. Fedorov, L. Steen, M. Baur, M. P. da Silva, and A. Wallra, “Implementation of a tooli gate with superconducting circuits,Nature, vol. 481, pp. 170 EP –, Dec 2011. [21] A. D. Greentree, S. G. Schirmer, F. Green, L. C. L. Hollenberg, A. R. Hamilton, and R. G. Clark, “Maximizing the hilbert space for a nite number of distinguishable quantum states,Phys. Rev. Lett., vol. 92, p. 097901, Mar 2004. [22] M. H. A. Khan and M. A. Perkowski, “Quantum ternary parallel adder/subtractor with partially-look-ahead carry,J. Syst. Archit., vol. 53, pp. 453–464, July 2007. [23] Y. Fan, “Applications of multi-valued quantum algorithms,arXiv preprint arXiv:0809.0932, 2008. [24] H. Y. Li, C. W. Wu, W. T. Liu, P. X. Chen, and C. Z. Li, “Fast quantum search algorithm for databases of arbitrary size and its implementation in a cavity QED system,Physics Letters A, vol. 375, pp. 4249–4254, Nov. 2011. [25] Y. Wangand M. Perkowski, “Improved complexity of quantum oracles for ternary grover algorithm for graph coloring,” in 2011 41st IEEE International Symposium on Multiple-Valued Logic, pp. 294–301, May 2011. [26] S. S. Ivanov, H. S. Tonchev, and N. V. Vitanov, “Time-ecient implementation of quantum search with qudits,Phys. Rev. A, vol. 85, p. 062321, Jun 2012. [27] A. Bocharov, M. Roetteler, and K. M. Svore, “Factoring with qutrits: Shor’s algo- rithm on ternary and metaplectic quantum architectures,” Phys. Rev. A, vol. 96, p. 012306, Jul 2017. [28] C. Gidney, “Constructing large controlled nots,” 2015. [29] Y. He, M.-X. Luo, E. Zhang, H.-K. Wang, and X.-F. Wang, “Decompositions of n-qubit Tooli Gates with Linear Circuit Complexity,International Journal of Theoretical Physics, vol. 56, pp. 2350–2361, July 2017. [30] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P. Shor, T. Sleator, J. A. Smolin, and H. Weinfurter, “Elementary gates for quantum com- putation,Phys. Rev. A, vol. 52, pp. 3457–3467, Nov 1995. [31] B. P. Lanyon, M. Barbieri, M. P. Almeida, T. Jennewein, T. C. Ralph, K. J. Resch, G. J. Pryde, J. L. O’Brien, A. Gilchrist, and A. G. White, “Simplifying quantum logic using higher-dimensional hilbert spaces,Nature Physics, vol. 5, pp. 134 EP –, 12 2008. [32] T. C. Ralph, K. J. Resch, and A. Gilchrist, “Ecient tooli gates using qudits, Phys. Rev. A, vol. 75, p. 022313, Feb 2007. [33] “Code for asymptotic improvements to quantum circuits via qutrits.” https:// github.com/epiqc/qutrits, 2019. [34] F. Tacchino, C. Macchiavello, D. Gerace, and D. Bajoni, “An articial neuron implemented on an actual quantum processor,npj Quantum Information, vol. 5, no. 1, p. 26, 2019. [35] F. Tacchino. Personal Communication. [36] T. G. Draper, “Addition on a quantum computer,arXiv preprint quant-ph/0008033, 2000. [37] C. Gidney, “Factoring with n+2 clean qubits and n-1 dirty qubits,arXiv preprint arXiv:1706.07884, 2017. [38] T. Häner, M. Roetteler, and K. M. Svore, “Factoring using 2n + 2 qubits with tooli based modular multiplication,Quantum Info. Comput., vol. 17, pp. 673–684, June 2017. [39] D. G. Cory, M. D. Price, W. Maas, E. Knill, R. Laamme, W. H. Zurek, T. F. Havel, and S. S. Somaroo, “Experimental quantum error correction,Phys. Rev. Lett., vol. 81, pp. 2152–2155, Sep 1998. [40] E. Dennis, “Toward fault-tolerant quantum computation without concatenation, Phys. Rev. A, vol. 63, p. 052314, Apr 2001. [41] M. Otten and S. K. Gray, “Accounting for errors in quantum algorithms via individual error reduction,npj Quantum Information, vol. 5, no. 1, p. 11, 2019. [42] D. Miller, T. Holz, H. Kampermann, and D. Bruß, “Propagation of generalized pauli errors in qudit cliord circuits,Physical Review A, vol. 98, no. 5, p. 052316, 2018. [43] N. Khammassi, I. Ashraf, X. Fu, C. G. Almudever, and K. Bertels, “Qx: A high- performance quantum computer simulation platform,” in Proceedings of the Con- ference on Design, Automation & Test in Europe, DATE ’17, (3001 Leuven, Belgium, Belgium), pp. 464–469, European Design and Automation Association, 2017. [44] “Quantum devices and simulators.” https://www.research.ibm.com/ibm-q/ technology/devices/, 2018. [45] D. Venturelli, M. Do, E. Rieel, and J. Frank, “Compiling quantum circuits to realistic hardware architectures using temporal planners,Quantum Science and Technology, vol. 3, p. 025004, feb 2018. [46] K. E. Booth, M. Do, J. C. Beck, E. Rieel, D. Venturelli, and J. Frank, “Comparing and integrating constraint programming and temporal planning for quantum circuit compilation,” in Twenty-Eighth International Conference on Automated Planning and Scheduling, 2018. [47] J. Biamonte and V. Bergholm, “Tensor networks in a nutshell,arXiv preprint arXiv:1708.00006, 2017. [48] T. A. Brun, “A simple model of quantum trajectories,” American Journal of Physics, vol. 70, no. 7, pp. 719–737, 2002. [49] R. Schack and T. A. Brun, “A C++ library using quantum trajectories to solve quantum master equations,Computer Physics Communications, vol. 102, no. 1-3, pp. 210–228, 1997. 12 [50] R. S. Smith, M. J. Curtis, and W. J. Zeng, “A practical quantum instruction set architecture,arXiv preprint arXiv:1608.03355, 2016. [51] J. Johansson, P. Nation, and F. Nori, “Qutip: An open-source python framework for the dynamics of open quantum systems,Computer Physics Communications, vol. 183, pp. 1760–1772, 8 2012. [52] J. Johansson, P. Nation, and F. Nori, “Qutip 2: A python framework for the dynamics of open quantum systems,Computer Physics Communications, vol. 184, pp. 1234–1240, 4 2013. [53] A. Y. Chernyavskiy, V. V. Voevodin, and V. V. Voevodin, “Parallel computational structure of noisy quantum circuits simulation,Lobachevskii Journal of Mathe- matics, vol. 39, pp. 494–502, May 2018. [54] N. M. Linke, D. Maslov, M. Roetteler, S. Debnath, C. Figgatt, K. A. Landsman, K. Wright, and C. Monroe, “Experimental comparison of two quantum computing architectures,Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3305–3310, 2017. [55] R. Barends, J. Kelly, A. Megrant, A. Veitia, D. Sank, E. Jerey, T. C. White,J. Mutus, A. G. Fowler, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, C. Neill, P. O’Malley, P. Roushan, A. Vainsencher, J. Wenner, A. N. Korotkov, A. N. Cle- land, and J. M. Martinis, “Superconducting quantum circuits at the surface code threshold for fault tolerance,Nature, vol. 508, pp. 500 EP –, 04 2014. [56] E. Barnes, C. Arenz, A. Pitchford, and S. E. Economou, “Fast microwave-driven three-qubit gates for cavity-coupled superconducting qubits,Phys. Rev. B, vol. 96, p. 024504, Jul 2017. [57] M. Reagor, W. Pfa, C. Axline, R. W. Heeres, N. Ofek, K. Sliwa, E. Holland, C. Wang, J. Blumo, K. Chou, M. J. Hatridge, L. Frunzio, M. H. Devoret, L. Jiang, and R. J. Schoelkopf, “Quantum memory with millisecond coherence in circuit qed,” Phys. Rev. B, vol. 94, p. 014506, Jul 2016. [58] N. Earnest, S. Chakram, Y. Lu, N. Irons, R. K. Naik, N. Leung, L. Ocola, D. A. Czaplewski, B. Baker, J. Lawrence, J. Koch, and D. I. Schuster, “Realization of a Λ system with metastable states of a capacitively shunted uxonium,Phys. Rev. Lett., vol. 120, p. 150504, Apr 2018. [59] S. M. Girvin, “Circuit qed: superconducting qubits coupled to microwave photons, Quantum Machines: Measurement and Control of Engineered Quantum Systems, p. 113, 2011. [60] K. R. Brown, J. Kim, and C. Monroe, “Co-designing a scalable quantum computer with trapped atomic ions,npj Quantum Information, vol. 2, p. 16034, 2016. [61] N. C. Brown and K. R. Brown, “Comparing zeeman qubits to hyperne qubits in the context of the surface code: 174Yb+ and 171Yb+ ,Phys. Rev. A, vol. 97, p. 052301, May 2018. [62] R. K. Naik, N. Leung, S. Chakram, P. Groszkowski, Y. Lu, N. Earnest, D. C. McKay, J. Koch, and D. I. Schuster, “Random access quantum information processors using multimode circuit quantum electrodynamics,Nature Communications, vol. 8, no. 1, p. 1904, 2017. [63] M. Soeken, S. Frehse, R. Wille, and R. Drechsler, “Revkit: An open source toolkit for the design of reversible circuits,” in Reversible Computation (A. De Vos and R. Wille, eds.), pp. 64–76, Springer Berlin Heidelberg, 2012. [64] D. M. Miller, D. Maslov, and G. W. Dueck, “A transformation based algorithm for reversible logic synthesis,” in Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451), pp. 318–323, June 2003. [65] V. Karimipour, A. Mani, and L. Memarzadeh, “Characterization of qutrit channels in terms of their covariance and symmetry properties,Phys. Rev. A, vol. 84, p. 012321, Jul 2011. [66] M. Grassl, L. Kong, Z. Wei, Z.-Q. Yin, and B. Zeng, “Quantum error-correcting codes for qudit amplitude damping,IEEE Transactions on Information Theory, vol. 64, no. 6, pp. 4674–4685, 2018. [67] J. Ghosh, A. G. Fowler, J. M. Martinis, and M. R. Geller, “Understanding the eects of leakage in superconducting quantum-error-detection circuits,Phys. Rev. A, vol. 88, p. 062329, Dec 2013. 13 ResearchGate has not been able to resolve any citations for this publication. Article Full-text available Artificial neural networks are the heart of machine learning algorithms and artificial intelligence. Historically, the simplest implementation of an artificial neuron traces back to the classical Rosenblatt’s “perceptron”, but its long term practical applications may be hindered by the fast scaling up of computational complexity, especially relevant for the training of multilayered perceptron networks. Here we introduce a quantum information-based algorithm implementing the quantum computer version of a binary-valued perceptron, which shows exponential advantage in storage resources over alternative realizations. We experimentally test a few qubits version of this model on an actual small-scale quantum processor, which gives answers consistent with the expected results. We show that this quantum model of a perceptron can be trained in a hybrid quantum-classical scheme employing a modified version of the perceptron update rule and used as an elementary nonlinear classifier of simple patterns, as a first step towards practical quantum neural networks efficiently implemented on near-term quantum processing hardware. Article Full-text available We present the detailed description of parallel computational structure of quantum circuits modeling. The deep theoretical and experimental analysis of corresponding algorithms and relations of their features to the nature of quantum computations are considered. Special attention is paid to the extension of modeling to the case of noisy circuits, which appear in realistic quantum computers. Article Full-text available We discuss a surprisingly simple scheme for accounting (and removal) of error in observables determined from quantum algorithms. A correction to the value of the observable is calculated by first measuring the observable with all error sources active and subsequently measuring the observable with each error source removed separately. We apply this scheme to the variational quantum eigensolver, simulating the calculation of the ground state energy of equilibrium H$_2$and LiH in the presence of several noise sources, including amplitude damping, dephasing, thermal noise, and correlated noise. We show that this scheme provides a decrease in the needed quality of the qubits by up to two orders of magnitude. In near-term quantum computers, where full fault-tolerant error correction is too expensive, this scheme provides a route to significantly more accurate calculation Conference Paper Full-text available Recently, the makespan-minimization problem of compiling a general class of quantum algorithms into near-term quantum processors has been introduced to the AI community. The research demonstrated that temporal planning is a strong approach for a class of quantum circuit compilation (QCC) problems. In this paper, we explore the use of constraint programming (CP) as an alternative and complementary approach to temporal planning. We extend previous work by introducing two new problem variations that incorporate important characteristics identified by the quantum computing community. We apply temporal planning and CP to the baseline and extended QCC problems as both stand-alone and hybrid approaches. Our hybrid methods use solutions found by temporal planning to warm start CP, leveraging the ability of the former to find satisficing solutions to problems with a high degree of task optionality, an area that CP typically struggles with. The CP model, benefiting from inferred bounds on planning horizon length and task counts provided by the warm start, is then used to find higher quality solutions. Our empirical evaluation indicates that while stand-alone CP is only competitive for the smallest problems, CP in our hybridization with temporal planning out-performs stand-alone temporal planning in the majority of problem classes. Article Full-text available Many systems used for quantum computing possess additional states beyond those defining the qubit. Leakage out of the qubit subspace must be considered when designing quantum error correction codes. Here we consider trapped ion qubits manipulated by Raman transitions. Zeeman qubits do not suffer from leakage errors but are sensitive to magnetic fields to first-order. Hyperfine qubits can be encoded in clock states that are insensitive to magnetic fields to first-order, but spontaneous scattering during the Raman transition can lead to leakage. Here we compare a Zeeman qubit ($^{174}$Yb$^+$) to a hyperfine qubit ($^{171}$Yb$^+$) in the context of the surface code. We find that the number of physical qubits required to reach a specific logical qubit error can be reduced by using$^{174}$Yb$^+$if the magnetic field can be stabilized with fluctuations smaller than$10\mu\$G.