Auxiliary Qubit Selection:
A Physical Synthesis Technique for Quantum Circuits
Naser Mohammadzadeh, Morteza Saheb Zamani, Mehdi Sedighi
Department of Computer Engineering and Information Technology
Amirkabir University of Technology
Quantum circuit design flow consists of two main tasks: synthesis and physical design. Addressing the limitations imposed
on optimization of the quantum circuit objectives because of no information sharing between synthesis and physical design
processes, we introduced the concept of “physical synthesis” for quantum circuit flow and proposed a technique for it.
Following that concept, in this paper we propose a new technique for physical synthesis using auxiliary qubit selection to
improve the latency of quantum circuits. Moreover, it will be shown that the auxiliary qubit selection technique can be
seamlessly integrated into the previously introduced physical synthesis flow. Our experimental results show that the
proposed technique decreases the average latency objective of quantum circuits by about 11% for the attempted
Keywords: Quantum Computing, Physical Design, Physical Synthesis, Auxiliary Qubit Selection
For the last few decades, silicon chips have become twice as fast every two years while the dimensions of the structures
on them have become twice as small. The matter follows the laws of quantum mechanics on the atomic scale. Therefore, if
the scaling continues at this rate, the behavior of computer circuits will have to be studied based on quantum mechanical
principles rather than classical physics . Although these quantum effects are great barriers in classical CMOS progress,
they can be used to develop a radically different form of computation . Theoretically, quantum computers, computers
using the quantum effects, could outperform their classical counterparts when solving certain problems. Factorization ,
unsorted database search , and the simulation of quantum mechanical systems  are some problems thought to be
intractable on a classical machine that can benefit from quantum algorithms. For example, in quantum cryptography, the non-
cloning property of quantum states  and the phenomenon of entanglement  have been utilized to help in the exchange of
secret keys between various parties, thus ensuring the security of cryptosystems using public key . MagiQ Technologies
 and IdQuantique  have built such cryptographic systems based on the single-photon communication.
A quantum algorithm requires a quantum circuit for a successful implementation. In a large picture, the quantum circuit
design flow can be divided into two main processes: synthesis and physical design (Figure 1). The synthesis process takes a
description and generates a technology-dependent netlist. On the other hand, the physical design process creates a specific
layout of the circuit based on the target technology. Even though it might appear that taking the layout information into
consideration during the synthesis process or the integration of two processes into one monolithic process can potentially lead
to a better final layout, the synthesis and physical design processes have been traditionally done separately to avoid
increasing the complexity of the process to an unmanageable level .
The CMOS design had a similar flow until the concept of physical synthesis, the interaction between synthesis and
physical design processes, was introduced in the mid to late 1990s . Physical synthesis deals with the local manipulation
of netlist or layout considering the layout information to improve the objectives or meet the design constraints. Such an
approach can also be useful in quantum circuit design. However, the physical synthesis techniques proposed in the classical
CMOS design are not directly applicable to the quantum circuit design because of the fundamental differences between
CMOS and quantum technologies. Therefore, special techniques for the physical synthesis of quantum circuits should be
Focusing on this issue, in  we introduced the physical synthesis concept for quantum circuit design flow and proposed
the gate exchanging technique for it. The general idea of gate exchanging heuristic is to determine the proper order of two
exchangeable gates based on layout information. The proposed flow in  is a general flow for contributing physical
synthesis techniques into the flow. The flow has an important property; any new physical synthesis technique can be easily
embedded in the flow. Following this way, in this paper we propose a new technique, called auxiliary qubit selection, for the
physical synthesis of quantum circuits. The proposed technique takes an initial netlist and layout and manipulates the netlist
locally considering the layout information to reach an improved netlist with a lower latency while preserving the overall
circuit functionality. This paper also shows how auxiliary qubit selection technique should be embedded in the flow.
Ion trap technology  is used as the underlying technology to study the proposed flow. Ion trap technology has been
physically realized using universal elements for quantum computation with a clear scalable model .
The rest of this paper is organized as follows: an overview of the prior work is presented in Section 2, followed by an
introduction to the ion trap technology in Section 3. In Section 4, some basic concepts are defined that are needed to describe
the approach. Section 5 includes the details of the proposed auxiliary qubit selection technique. The details of the physical
synthesis flow, modified to include the proposed technique, are discussed in Section 6. Section 7 shows the experimental
results, and Section 8 concludes the paper.
Figure 1. Quantum circuit design flow
2. Related Work
Despite significant work done on quantum algorithms and their underlying physics, only a few studies have explored
quantum circuit design flow. Svore et al.   proposed a design flow that starts with a quantum program and generates
its corresponding physical operations. Their work outlined various file formats and provided initial implementations of some
of the necessary tools. Their design flow which has four phases, converts a high-level program specified in the mathematical
abstractions of quantum mechanics and linear algebra into a low-level set of machine instructions scheduled on a fixed H-
tree-based layout .
Similarly, Balensiefer et al.   proposed a design flow which takes a quantum description in QCLP0F1
synthesizes it to a technology-dependent netlist. In the physical design phase, the generated netlist is scheduled on a fixed
layout by a list-scheduling algorithm .
P  and
Whitney et al.  also suggested a quantum design flow that takes a description and generates its layout in ion trap
technology. They proposed new heuristics for the layout generation and scheduling. Their physical design stage includes
laying out and scheduling a fixed netlist.
Additionally, hand-optimized layouts have been proposed in the literature. Metodi et al. proposed a uniform Quantum
Logic Array architecture , and extended it later . Since the focus of their work was on the architectural perspective,
the details of physical layout or scheduling were not explored. The same group later developed a tool to automatically
generate a scheduling for physical operations, given a quantum circuit and a fixed grid-based layout structure .
All of the above-mentioned approaches perform the synthesis and physical design processes separately. The algorithms
proposed for physical design in these papers take a fixed netlist and generate a corresponding layout.
In our earlier work , we introduced the physical synthesis concept for a quantum design flow and proposed a
technique for it that exchanges the gates in the design after layout generation to improve the latency of quantum circuit
execution. Following the principles of the physical synthesis concept, in this paper, a new physical synthesis technique is
proposed and physical synthesis flow is modified to embed the proposed technique.
3. Technology Abstraction
In ion trap technology, a physical qubit is an ion, and a gate is a location where a trapped ion may be operated upon by a
modulated laser. Pulse sequences applied to discrete electrodes on the edges of the ion traps cause the ion to be trapped or
ballistically moved between traps. Figure 2a shows a layout that was experimentally demonstrated for a three-way
1 QCL (Quantum Computation Language) defined by B. Omer  utilizes a syntax derived from C and provides a quantum simulator for
code development and testing on a classical computing platform.
Figure 2. a) Physical layout demonstrated for a T-junction (three-way intersection). b) Abstraction of the circuit in (a), built
using the StraightChannel and ThreeWayIntersection macroblocks shown in Figure 3. c) MEMS mirrors placed above the ion
traps plane guide the laser beams to gate locations .
In this paper, the library of macroblocks defined in  are used for two reasons. First, by using the macroblocks, some
of the low-level details can be removed and the analyses do not have to consider the variations in the ion traps technology
implementation. Details such as ion species, electrode sizing and geometry, and exact voltage levels necessary for trapping
and moving ions are all summarized within the macroblocks. Secondly, a carefully timed application of pulse sequences to
electrodes in non-adjacent traps is required for ballistic movements along a channel. Using basic blocks consisting of a few
ion traps has the benefit that building an interface between the basic blocks requires communication only between the two
Figure 3 shows the library defined in . In this library, each macroblock consists of a 3x3 structure of trap regions and
electrodes with some ports to allow qubit movement between the macroblocks. The black squares are gate locations. The gate
may not be performed at intersections or turns in the ion trap technology. Different orientations of each of these macroblocks
can be used in a layout. Figure 2 shows a possible mapping of a demonstrated layout (Figure 2a) to macroblock abstractions
(Figure 2b). As Figure 2c shows, the laser pulses are guided to the gate locations by an array of MEMS mirrors located above
the ion trap plane in order to apply quantum gates .
Figure 3. Library of basic macroblocks used in this paper. Ports (P0-P3) and electrodes of each marcoblock make it possible for
ions to be moved and trapped. Some macroblocks contain a trap region where gates may be performed (black squares) .
Some key characteristics of ion trap technology can be summarized as follows:
Rectangular channels lined with electrodes make “wires” in ion traps. Atomic ions can be suspended above the channel
regions and moved ballistically . The synchronized application of voltages on the channel electrodes causes qubits
to move ballistically. Therefore, the movement control circuitry is required for each wire to handle any qubit
Any operation available in the ion trap technology can be performed at each gate location. This makes it possible to
reuse gate locations within a quantum circuit.
Fabrication and control of ion traps in the third dimension is difficult. Thus, scalable ion trap systems are two-
dimensional . Therefore, routing channels should have T-junction(s) or cross-junction(s) to allow ions to move from
one channel to another.
Multiple ions may use any routing channel as long as control circuits prevent one channel from having more than one
ion at each instant of time.
Aside from Manhattan distance between the source and target location for an ion movement, the geometry of the wire
channel is also important in the calculation of movement latency. Experiments have shown that a right angle turn takes
substantially longer than a straight channel over the same distance .
4. Basic Concepts
In the quantum design flow, the output of a synthesis tool is a quantum netlist that is composed of quantum bits (qubits)
and quantum gates. The quantum gates can be one-input gates (e.g. Hadamard), two-input gates (e.g. controlled-V), or multi-
input gates (e.g. CnNOT for n>2) . The following terms are defined to describe the main idea of the paper.
Definition 1: A quantum gate is called a macro gate if it has more than two control lines. For example, a C3NOT that
has three control lines and one target line is called a macro gate (Figure 4).
Definition 2: A qubit is called an auxiliary qubit for a macro gate if it is not in the set of primary inputs of that macro
gate but it is used to decompose the macro gate into primitive gates. Auxiliary qubits have an important property; the values
of auxiliary qubits before and after a macro gate are equal. Figure 4 shows an example. The primary inputs of the macro gate
shown in the left side of the figure are Q0, Q1, Q2, and Q3. The qubit Q4 that is used to decompose the macro gate is an
auxiliary qubit. As can be verified, the value of Q4 at point 1 is equal to its value at point 2.
Definition 3: Latency is the total time that it takes for a circuit to be executed on a particular layout.
5. Physical Synthesis of Quantum Circuits
Integrating the synthesis and physical design processes into one monolithic process may not be practical because of the
unmanageable complexity of the problem . On the other hand, doing them separately and without any information
sharing between the two processes can limit the optimization effectiveness. An intermediate solution is conceivable that
changes layout and/or netlist locally considering the layout information to improve the metrics or meet the design constraints.
In the prior flows, synthesis algorithms attempt to generate the netlist which no layout information is available in that stage.
Then a physical design process takes this fixed netlist and generates a layout without changing the input netlist. In our
previous work we have shown that adjusting the netlist after synthesis process by some local netlist and/or layout
manipulations considering the physical layout information could improve the circuit metrics . This idea is known as
physical synthesis in classical CMOS design . Gate sizing, buffer insertion, and wire sizing are some techniques proposed
for the physical synthesis in classical CMOS design . These techniques are not applicable to quantum circuits, but the
general idea can be used to improve characteristics of quantum circuits. To do this, we introduced the physical synthesis
concept in  for quantum circuits and proposed a technique for it. Going along with this idea, in this section a new
physical synthesis technique for quantum circuits is introduced that uses layout information to select proper auxiliary qubits
for macro gate decomposition to reduce the quantum circuit execution time. In other words, in the proposed technique,
layout information is used to properly select auxiliary qubits. It is important to note that the initial and the modified netlists
both have the same functionality and synthesis cost in terms of the number of gates or circuit depth . Therefore, the
existing synthesis algorithms cannot prefer one to another.
5.1 Auxiliary Qubit Selection Technique
As shown in Figure 1, the synthesis process can be divided into two subprocesses. The first subprocess synthesizes the
initial description to a technology-independent netlist and the second one takes this netlist and generates a technology-
dependent netlist. The technology-independent netlist often contains macro gates that are converted into primitive gates
during technology mapping. Various methods have been proposed for decomposition of a macro gate into primitive gates
. Some of these methods use auxiliary qubits in decomposing macro gates. However, since in the technology-mapping
stage there is no information about the layout, finding the proper qubit(s) that when used as the auxiliary qubit(s) in terms of
latency, is not possible in this stage. The goal of the Auxiliary qubit selection technique is using layout information to find
the best auxiliary qubit(s) and minimize the latency.
Figure 4. A macro gate and its decomposition with one auxiliary qubit by Lemma 7.2 proposed in 
Point 1 Point 2
To illustrate the proposed technique, Figure 5a shows a QASM  instruction sequence operating on qubits Q1,…,Q6.
The netlist includes one macro gate (C3NOT). Lemma 7.2 proposed in  is used to decompose the macro gate. Figure 5b
shows the equivalent quantum circuit. If Q5 is used as auxiliary qubit to decompose the macro gate, the result will be the
netlist in Figure 5c. The gates G1 and G20 respectively are the highest and the lowest gates generated from decomposition of
the macro gate. Figure 5d shows the layout generated for the netlist by the dataflow-based algorithm described in Section
6.1.1. Each gate location is labeled by the gate number that is to be operated in it. The dataflow graph of the circuit is shown
in Figure 6a. The label of each edge shows the minimum delay between two nodes whereas the label of each node represents
its delay to the end of the tree and is used as the node’s priority. Physical latencies shown in Table 1 are used for the gates
and for the two types of move operations in ion trap technology . The latency of the circuit is therefore, 763 µs.
Table 1. The latency values for various physical operations in ion trap technology 
Physical Operation Latency (μs)
P1: H Q1
P2: H Q2
P3: H Q3
P4: H Q4
P5: H Q5
P6: H Q6
G1: CV Q5,Q1
G2: CX Q2,Q5
G3: CVN Q5,Q1
G4: CX Q2,Q5
G5: CV Q2,Q1
G6: CV Q3,Q5
G7: CX Q4,Q3
G8: CVN Q3,Q5
G9: CX Q4,Q3
G10: CV Q4,Q5
G11: CV Q5,Q1
G12: CX Q2,Q5
G13: CVN Q5,Q1
G14: CX Q2,Q5
G15: CV Q2,Q1
G16: CV Q3,Q5
G17: CX Q4,Q3
G18: CVN Q3,Q5
G19: CX Q4,Q3
G20: CV Q4,Q5
P8: CX Q6,Q2
P9: CX Q5,Q3
P1: H Q1
P2: H Q2
P3: H Q3
P4: H Q4
P5: H Q5
P6: H Q6
P7: T4 Q4,Q3,Q2,Q1
P8: CX Q6,Q2
P9: CX Q5,Q3
Figure 5. (a) Circuit netlist including a macro gate (b) the circuit representation, (c) the circuit netlist after macro gate decomposition
using Q5 as the auxiliary qubit (d) the generated layout by the dataflow-based algorithm
On the other hand, if the circuit is modified to use Q6 as the auxiliary qubit, the netlist and the dataflow graph are
changed as shown in Figure 6b and 6c, respectively. In this example, the latency of the modified circuit is 757 µs.
Consequently, a proper auxiliary qubit selection by the physical synthesis technique can improve the latency of the circuit by
Figure 6. (a) Dataflow graph of the initial circuit (latency = 763 µs) (b) Modified netlist using Q6 as the auxiliary qubit (c)
Dataflow graph of the modified netlist (latency = 757 µs)
P1: H Q1
P2: H Q2
P3: H Q3
P4: H Q4
P5: H Q5
P6: H Q6
G1: CV Q6,Q1
G2: CX Q2,Q6
G3: CVN Q6,Q1
G4: CX Q2,Q6
G5: CV Q2,Q1
G6: CV Q3,Q6
G7: CX Q4,Q3
G8: CVN Q3,Q6
G9: CX Q4,Q3
G10: CV Q4,Q6
G11: CV Q6,Q1
G12: CX Q2,Q6
G13: CVN Q6,Q1
G14: CX Q2,Q6
G15: CV Q2,Q1
G16: CV Q3,Q6
G17: CX Q4,Q3
G18: CVN Q3,Q6
G19: CX Q4,Q3
G20: CV Q4,Q6
P8: CX Q6,Q2
P9: CX Q5,Q3
712 637 588
The lowest gate of
the macro gate
gate of the
Macro gate after decomposition
Macro gate after decomposition
6. The Proposed Physical Synthesis Flow
The modified flow including the proposed physical synthesis technique is presented in this section. This flow is shown in
Figure 7. It starts with an initial layout generation step, followed by an optimization loop implementing the auxiliary qubit
selection technique. After generating an initial layout, in the first step of the optimization loop, the netlist is parsed to find an
unprocessed macro gate. If such a macro gate is found, for each auxiliary qubit, the candidates that can be substituted for it
are checked and the best candidate (i.e. the one that decreases latency the most) is selected to be substituted for the initial
auxiliary qubit. Candidates are tentatively substituted for the initial auxiliary qubit and the new netlist is evaluated by
updating the routing and the scheduling of the circuit to reflect the effect of the auxiliary qubit substitution on the latency of
the circuit. If one candidate increases the latency, it is rejected and the optimization loop continues with other candidates;
Initial Scheduled Layout (6.1)
Update Scheduling (6.3)
Placement & Routing (6.1.1)
Accept this substitution and change the netlist
Reject this substitution
Figure 7. The modified physical synthesis Flow
Classical Control Extraction
Auxiliary Qubit Selection technique
Is there any untested
Does macro gate have any
unprocessed auxiliary qubit?
Tentatively substitute candidate
qubit for auxiliary qubit
Is latency improved?
Is there any unprocessed
Update Routing (6.2)
Final Scheduled Layout
otherwise, the substitution is accepted and the netlist is modified. The optimization loop continues until all candidates for all
macro gates are checked.
Once the optimization process is finished and the layout and the netlist are finalized, the classical control should be
extracted. The classical control system is responsible for executing the quantum circuit on the layout. This includes
determining where and when gate operations occur as well as managing and tracking every qubit in the system. In the rest of
this section, the details of main stages of the flow are discussed.
6.1 Scheduled Layout Generation
The first part of the flow in Figure 7, “scheduled layout”, takes a netlist and generates an initial layout through an iterative
process. This process has two subprocesses that are done subsequently in a loop to generate a better scheduled layout. In the
following subsection, the placement and routing heuristic is described and the second subsection explains the instruction
scheduling approach used in this paper.
Placement and Routing
In this paper, dataflow-based layout generation algorithm proposed in  is used to place and route a circuit. This
algorithm claims to offer the best latency by taking a technology-dependent netlist and generating a layout comprised of the
macroblocks described in Section 3. The algorithm starts with creating dataflow graph of the circuit. In the next step, gate
locations are placed in topological order in the dataflow graph. As this style of placement may waste space due to the uneven
column sizes, a folding operation is performed. The folding operation joins a short column with the previous column in order
to fill out the rectangular bounding box of the layout as much as possible and decrease area. Then, the columns are sorted to
set the gate locations that need to be connected roughly horizontal to one another. After placing the gate locations, channels
are routed to reflect dataflow edges. Since the initial layout has too many gate locations, the dataflow graph is collapsed using
feedback from the scheduler. The algorithm identifies latencies of critical edges by using the scheduler feedback and merges
the two nodes connected by an edge with the longest latency on the critical path. All instructions within a merged group are
executed at a single gate location. This new group graph is then placed, routed and scheduled again to find the next pair of
node groups to merge and this merging, placing and routing procedure continues until a point is reached where congestion at
some heavily merged node group is actually hurting the latency with each further merge or no improvement is achieved.
The runtime execution order of the instructions is determined by the instruction issue logic. The instruction issue logic
involves both preprocessing and online scheduling. First, the instruction sequence is preprocessed to assign priorities that will
help during scheduling. The priority of an instruction is based on the length of its critical path to the end of dataflow graph.
Since the gate locations are known in advance, the movement latencies can be incorporated in the prioritization of the
instruction sequence. In other words, movement latencies can be considered as well as gate delays in the assignment of
priorities to instructions during preprocessing. This gives a better approximation of each qubit’s critical path. The scheduling
used in this paper is similar to the method used in , but it uses critical path with gate and movement latencies to set the
priority of a gate rather than the size of the dependent subtree to that gate. The instruction sequence is traversed from the
beginning to the end and instructions are scheduled as soon as the dependencies allow.
The scheduler implements a greedy scheduling technique. It maintains a list of instructions which have all their
dependencies fulfilled and therefore, are ready to be executed. Among the ready instructions, the instruction with the highest
priority will be run and is more likely to gain access to the resources it needs. These contested resources include both gates
and channels/intersections. Once all the possible instructions are scheduled, time advances until one or more resources are
freed and more instructions can be scheduled. This scheduling process continues until the complete instruction sequence is
It is worth noting that the proposed flow uses scheduling information to decide whether to accept or reject auxiliary qubit
substitutions. The proposed technique is not stuck at scheduling method and it has its advantage over different scheduling
schemes. In other words, even if we use a scheduling method resulting in the best latency, the technique can still improve the
latency. However, since exhaustive scheduling is impractical for large circuits, we use a greedy heuristic to schedule
6.2 Update Routing
When a qubit is substituted for an auxiliary qubit, the routes traversed by the two qubits are modified. For example, in
Figure 5, after substitution of Q6 for Q5, the dataflow graph is changed as shown in Figure 6.c. On one hand, the edges
(P5,G1), (G20,P9), (P6,P8) are deleted from the initial dataflow graph(Figure 6.a). On the other hand, the edges (P5,P9),
(P6,G1), (G20,P8) are added to the dataflow graph. Therefore, routes should be found on the new edges. It should be noticed
that the routes between the gate locations generated from decomposing of the macro gate do not change because the
technique does not modify those gate locations; it only replaces the auxiliary qubit.
6.3 Update Scheduling
The information obtained from the scheduling process is used to accept or reject an auxiliary qubit substitution, so it
should be done in each iteration of the optimization loop. However, performing a complete scheduling in each iteration can
dramatically increase total run time of the optimization program for large netlists. Considering that, since the auxiliary qubit
substitution often modifies a small part of the netlist tree, there may not be a need for performing scheduling completely in
each iteration. Focusing on this fact, in the proposed flow, the scheduling is incrementally updated in each iteration of the
optimization loop. This decreases the run time of each iteration and therefore, leads to overall run time reduction.
The scheduler selects operations based on their dependencies and priorities. The auxiliary qubit substitution changes the
priorities of the operations. Therefore, the update-scheduling operation must modify the priorities of the modified nodes and
propagate the effects of these changes to the nodes located higher than the modified nodes in the dataflow graph. The
propagation continues up to the root of the dataflow graph (i.e., a dummy node with the first level gates as its children).
7. Experimental Results
We experimented with a number of quantum circuit benchmarks from . Since our technique decreases the latency of
circuits with macro gates, the benchmarks that auxiliary qubit selection technique is not applicable to them were not
attempted. Physical latencies shown in Table 1 were used for the gates and for the two types of move operations in ion trap
technology . The benchmarks before decomposition include CnNOT gates but after the decomposition, they may include
Controlled-V, Controlled-V+, and CNOT gates. Table 2 shows the experimental results. The correctness of our approach was
verified by Quiver, an application to aid in the visualization and verification of reversible quantum circuits .
Table 2 shows the latency of the benchmark circuits achieved by the proposed flow compared with the best in literature
. The column “# of Macro Gates” contains the number of macro gates whose auxiliary qubits can change. In other words,
this number does not include macro gates with only one choice for their auxiliary qubits. The column “# of Auxiliary Qubit
Changes” includes the number of auxiliary qubits that have been substituted. The latency of circuits obtained by the best
prior physical design flow in terms of the latency and the proposed physical synthesis flow are shown in the columns “Prior
Physical Design Flow” and “Proposed Physical Synthesis Flow”, respectively.
The column “Improvement” shows the latency improvement resulted from the physical synthesis approach proposed in
this paper. As can be seen, an average improvement of 10.96% is achieved in the latency of the benchmarks. The results of
Table 2 are summarized in Figure 8 in term of the latency.
2 All results of this section are obtained on a 3 GHz Pentium IV with 1 gigabyte of memory.
3 As calculated by “Rational Quantify” suit .
Table 2. The latency of the benchmark circuits achieved by the proposed flow compared with the best in literature2
Circuit name 
# of Auxiliary
Design Flow 
Run Time (ms)P2F3
68 39 115581 87819 31.61
16 8 27407 23772 15.29
5 3 9536 9227 3.35
4 4 4005 3896 2.8
2 2 2926 2763 5.9
37 20 80741 67098 20.33
4 3 6612 6436 2.73
3 1 6603 6524 1.21
6 1 11081 10947 1.22
8 3 18841 17554 7.33
10 5 20370 15988 27.41
12 3 24722 21455 15.23
14 1 26234 25161 4.26
16 3 32921 29701 10.84
18 1 34895 32871 6.16
20 6 41524 36792 12.86
23 9 45893 39651 15.74
35 16 74451 60658 22.74
14 4 79520 73395 8.35
28 7 103320 91011 13.52
4 2 3940 3881 1.52
Figure 8. The latency reduction achieved by the proposed physical synthesis approach
7.1 Time Complexity
The time complexity of the proposed physical synthesis technique can be calculated as follows. Since the proposed
algorithm examines each substitution candidate for each auxiliary qubit of each macro gate, the number of iterations is:
C i 7.1
where m is the number of macro gates in the design and C(i) is the number of substitution candidates for macro gate i . On the
other hand, as shown in Figure 7, the scheduling should be updated for each tentative substitution (i.e. in each iteration). The
dominant part of the update-scheduling runtime is the runtime of updating the priorities. The number of steps for updating the
priorities for each tentative substitution is equal to the level of the lowest gate generated from decomposition of macro gate
(e.g. G20 in Figure 6c) because when the auxiliary qubit of a macro gate is substituted with another one, only the priorities of
the nodes located higher than the lowest gate of the macro gate need to be updated. Therefore, the upper bound of time
complexity of the update-scheduling process is equal to the number of gates. Based on this analysis, the overall time
complexity of the proposed approach can be calculated as
O(m × q × g) (7.2)
where m is the number of macro gates in the design, q is the number of qubits which, in the worst case, is equal to the number
of candidates for each macro gate. In other words, m×q is the upper bound of Expression 7.1. Finally, g is the upper bound of
time complexity of update-scheduling process that is the number of gates.
In this paper, a new physical synthesis technique was proposed which modifies the circuit netlist by considering its layout
information to improve the latency of quantum circuit execution. In the proposed technique, layout information is used to
find the best auxiliary qubit that decreases the overall latency. The design flow was applied to a set of benchmarks with
macro gates. Experimental results show that the design flow enhanced by our physical synthesis technique can improve the
latency of quantum circuits by up to 31.61% for the attempted benchmarks. The authors are working on new physical
synthesis techniques and improving the proposed flow.
We would like to thank Prof. D. Wineland and Prof. J. Kubiatowicz for their invaluable deliberations.
 S. Lloyd, "Quantum-Mechanical Computers," Scientific American, No. 273, pp. 44-49, 1995.
 R. P. Feynman, “Quantum mechanical computers,” Foundations of Physics, 16:507, 1986.
 P. Shor, “Polynomial Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer,” SIAM Journal on
Computing, Vol. 26, No. 5, pp. 1484-1509, 1997.
 L. Grover, “A Fast Quantum Mechanical Algorithm for Database Search,” Proceeding of ACM Symposium on Theory of Computing,
pp. 212-219, 1996.
 C. Zalka, “Simulating quantum systems on a quantum computer,” Proceeding of Mathematical, Physical and Engineering Sciences,
pp. 313–322, 1998.
 W. K. Wootters, W. H. Zurek, “A single quantum cannot be cloned,” 2TNature2T 14T299, pp. 14T802-803, 1982.
 E. Schrödinger, “The present situation in 9Tquantum mechanics,” 9TNaturewiss. 48, pp. 807-812, 1935.
 D. Welsh, “codes and cryptography,” Oxford University Press, 1988.
8TUhttp://www.magiqtech.com/MagiQ/Home.htmlU8T, accessed on 2010-2-28.
8TUhttp://www.idquantique.comU8T, accessed on 2010-2-28.
 M. Whitney, N. Isailovic, Y. Patel, and J. Kubiatowicz, “Automated Generation of Layout and Control for Quantum Circuits,”
Proceeding of Computing Frontiers, pp. 83–94, 2007.
 C. J. Alpert, C. Chu, “14TPhysical synthesis comes of age,” 11T14TProceedings of International Conference on Computer-Aided Design
(ICCAD), 11Tpp. 246-249, 2007.
 N. Mohammadzadeh, M. Sedighi , M. Saheb Zamani, “Quantum Physical Synthesis: Improving Physical Design by Netlist
Modifications,” accepted for publication in Elsevier Microelectronics Journal.
14TH. Häffner14T, 8TC.F. Roos8T, 8TR. Blatt8T,“Quantum computing with trapped ions,” Physics Reports, pp. 155-203, 2008.
 D. Kielpinski, C. Monroe, and D. Wineland, “Architecture for a large-scale ion-trap quantum computer,” Nature 417, pp. 709–711,
 K. Svore, A. Aho, A. Cross, I. Chuang, and I. Markov, “A Layered Software Architecture for Quantum Computing Design Tools,”
Computer, Vol. 39, No. 1, pp. 74–83, 2006.
 K. Svore, A. Cross, A. Aho, I. Chuang, and I. Markov, “Toward a software architecture for quantum computing design tools,”
Proceedings of the 2nd International Workshop on Quantum Programming Languages (QPL), pp. 145–162, 2004.
 S. Balensiefer, L. Kreger-Stickles, and M. Oskin, “QUALE: quantum architecture layout evaluator,” Proceedings of SPIE the
international society for optical endineering, Vol. 5815, pp. 103-114, 2005.
 S. Balensiefer, L. Kregor-Stickles, and M. Oskin, “An evaluation framework and instruction set architecture for ion-trap based
quantum micro-architectures,” Proceedings of International Symposium on Computer Architecture (ISCA), pp. 186 – 196, 2005.
 B. Omer, “Quantum programming in qcl,” Master thesis, Technical University of Vienna, 2000.
 T. Yang and A. Gerasoulis, “List scheduling with and without communication delays,” Journal of Parallel Computing, Vol. 19, No.
12, pp. 1321–1344, 1993.
 T. Metodi, D. Thaker, A. Cross, F. Chong, and I. Chuang, “A Quantum Logic Array Microarchitecture: Scalable Quantum Data
Movement and Computation,” Proceedings of the 38th International Symposium on Microarchitecture (MICRO), pp. 305-318, 2005.
 D. Thaker, T. Metodi, A. Cross, I. Chuang, and F. Chong, “Quantum Memory Hierarchies: Efficient Designs to Match Available
Parallelism in Quantum Computing,” Proceedings of the 33rd International Symposium on Computer Architecture (ISCA), pp. 378-
 T. Metodi, D. Thaker, A. Cross, F. Chong, and I. Chuang, “Scheduling physical operations in a quantum information processor,”
Proceedings of SPIE Defense and Security Symposium, Vol. 6244, pp. 62440T.1-62440T.12, 2006.
8TD. Hucul8T, 8TM. Yeo8T, 8TS. Olmschenk8T, 8TC. Monroe8T, 8TW. Hensinger8T, 8TJ. Rabchuk8T, “On the transport of atomic ions in linear and
multidimensional ion trap arrays,” Journal of Quantum Information and Computation, Vol. 8, No. 6, pp. 0501-0578, 2008.
 J. Kim, S. Pau, Z. Ma, H. McLellan, J. Gages,A. Kornblit, R. Slusher, “System design for large-scale ion trap quantum information
processor,” Journal of Quantum Information and Computation, Vol. 5, No. 7, pp. 515–537, 2005.
 J. Chiaverini et al., “Surface-electrode architecture for ion-trap quantum information processing,” Journal of Quantum Information
and Computation, Vol. 5, No. 5, pp. 419-439, 2005.
 M. A. Nielsen, I. L. Chuang, “Quantum computation and quantum computation,” Cambridge University Press, 2000.
 A. Barenco et al., "Elementary Gates For Quantum Computation," Physical Review A 52, pp. 3457-3467, 1995.
 M. Saeedi, N. Mohammadzadeh, M. sedighi, M. Saheb Zamani, "Towards a Thorough Set of Metrics for Quantum Circuit Synthesis,"
International Journal of Physics, Vol. 1, No. 2, pp. 9-22, January 2008.
 A. Cross, “Synthesis and Evaluation of Fault-Tolerant Quantum Computer Architectures,” Ph.D. Thesis, Massachusetts Institute of
 M. Whitney, N. Isailovic, Y. Patel, and J. Kubiatowicz, “A Fault Tolerant, Area Efficient Architecture for Shor’s Factoring
Algorithm,” ISCA’09, 2009.
 D. Maslov, G. Dueck, and N. Scott, “Reversible Logic Synthesis Benchmarks Page,” http://www.cs.uvic.ca/~dmaslov/
 W. Robert et al., “Quiver: An online tool
http://www.revlib.org/tools.php, accessed on 2010-2-28.
 IBM Rational software, Version 2003, www.ibm.com/software/rational, accessed on 2010-2-28.
for reversible quantum circuit visualization and verification,”