Preprint

pc-COP: An Efficient and Configurable 2048-p-Bit Fully-Connected Probabilistic Computing Accelerator for Combinatorial Optimization

Authors:
Preprints and early-stage research may not have been peer reviewed yet.
To read the file of this research, you can request a copy directly from the authors.

Abstract

Probabilistic computing is an emerging quantum-inspired computing paradigm capable of solving combinatorial optimization and various other classes of computationally hard problems. In this work, we present pc-COP, an efficient and configurable probabilistic computing hardware accelerator with 2048 fully connected probabilistic bits (p-bits) implemented on Xilinx UltraScale+ FPGA. We propose a pseudo-parallel p-bit update architecture with speculate-and-select logic which improves overall performance by 4×4 \times compared to the traditional sequential p-bit update. Using our FPGA-based accelerator, we demonstrate the standard G-Set graph maximum cut benchmarks with near-99% average accuracy. Compared to state-of-the-art hardware implementations, we achieve similar performance and accuracy with lower FPGA resource utilization.

No file available

Request Full-text Paper PDF

To read the file of this research,
you can request a copy directly from the authors.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
This article critically investigates the limitations of the simulated annealing algorithm using probabilistic bits (pSA) in solving large-scale combinatorial optimization problems. The study begins with an in-depth analysis of the pSA process, focusing on the issues resulting from unexpected oscillations among p-bits. These oscillations hinder the energy reduction of the Ising model and thus obstruct the successful execution of pSA in complex tasks. Through detailed simulations, we unravel the root cause of this energy stagnation, identifying the feedback mechanism inherent to the pSA operation as the primary contributor to these disruptive oscillations. To address this challenge, we propose two novel algorithms, time average pSA (TApSA) and stalled pSA (SpSA). These algorithms are designed based on partial deactivation of p-bits and are thoroughly tested using Python simulations on maximum cut benchmarks that are typical combinatorial optimization problems. On the 16 benchmarks from 800 to 5000 nodes, the proposed methods improve the normalized cut value from 0.8 to 98.4% on average in comparison with the conventional pSA.
Article
Full-text available
This paper presents a local energy distribution based hyperparameter determination for stochastic simulated annealing (SSA). SSA is capable of solving combinatorial optimization problems faster than typical simulated annealing (SA), but requires a time-consuming hyperparameter search. The proposed method determines hyperparameters based on the local energy distributions of spins (probabilistic bits). The spin is a basic computing element of SSA and is graphically connected to other spins with its weights. The distribution of the local energy can be estimated based on the central limit theorem (CLT). The CLT-based normal distribution is used to determine the hyperparameters, which reduces the time complexity for hyperparameter search from O(n3)\mathcal {O}(n^{3}) of the conventional method to O(1)\mathcal {O}(1) . The performance of SSA with the determined hyperparameters is evaluated on the Gset and K2000 benchmarks for maximum-cut problems. The results show that the proposed method achieves mean cut values of approximately 98% of the best-known cut values.
Article
Full-text available
Probabilistic computing has been introduced to operate functional networks using a probabilistic bit (p-bit), broadening the computational abilities in non-deterministic polynomial searching operations. However, previous developments have focused on emulating the operation of quantum computers similarly, implementing every p-bit with large weight-sum matrix multiplication blocks and requiring tens of times more p-bits than semiprime bits. In addition, operations based on a conventional simulated annealing scheme required a large number of sampling operations, which deteriorated the performance of the Ising machines. Here we introduce a prime factorization machine with a virtually connected Boltzmann machine and probabilistic annealing method, which are designed to reduce the hardware complexity and number of sampling operations. From 10-bit to 64-bit prime factorizations were performed, and the machine offers up to 1.2 × 10⁸ times improvement in the number of sampling operations compared with previous factorization machines, with a 22-fold smaller hardware resource.
Article
Full-text available
The transistor celebrated its 75 th birthday in 2022. The continued scaling of the transistor defined by Moore’s Law continues, albeit at a slower pace. Meanwhile, computing demands and energy consumption required by modern artificial intelligence (AI) algorithms have skyrocketed. As an alternative to scaling transistors for general-purpose computing, the integration of transistors with unconventional technologies has emerged as a promising path for domain-specific computing. In this article, we provide a full-stack review of probabilistic computing with p-bits as a representative example of the energy-efficient and domain-specific computing movement. We argue that p-bits could be used to build energy-efficient probabilistic systems, tailored for probabilistic algorithms and applications. From hardware, architecture, and algorithmic perspectives, we outline the main applications of probabilistic computers ranging from probabilistic machine learning and AI to combinatorial optimization and quantum simulation. Combining emerging nanodevices with the existing CMOS ecosystem will lead to probabilistic computers with orders of magnitude improvements in energy efficiency and probabilistic sampling, potentially unlocking previously unexplored regimes for powerful probabilistic algorithms.
Article
Full-text available
Probabilistic computing is an emerging computational paradigm that uses probabilistic circuits to efficiently solve optimization problems such as invertible logic, where traditional digital computations are difficult to solve. This paper proposes a true random number generator (TRNG) based on resistive random-access memory (RRAM), which is combined with an activation function implemented by a piecewise linear function to form a standard p-bit cell, one of the most important parts of a p-circuit. A p-bit multiplexing strategy is also applied to reduce the number of p-bits and improve resource utilization. To verify the superiority of the proposed probabilistic circuit, we implement the invertible p-circuit on a field-programmable gate array (FPGA), including AND gates, full adders, multi-bit adders, and multipliers. The results of the FPGA implementation show that our approach can significantly save the consumption of hardware resources.
Article
Full-text available
Solving computationally hard problems using conventional computing architectures is often slow and energetically inefficient. Quantum computing may help with these challenges, but it is still in the early stages of development. A quantum-inspired alternative is to build domain-specific architectures with classical hardware. Here we report a sparse Ising machine that achieves massive parallelism where the flips per second—the key figure of merit—scales linearly with the number of probabilistic bits. Our sparse Ising machine architecture, prototyped on a field-programmable gate array, is up to six orders of magnitude faster than standard Gibbs sampling on a central processing unit, and offers 5–18 times improvements in sampling speed compared with approaches based on tensor processing units and graphics processing units. Our sparse Ising machine can reliably factor semi-primes up to 32 bits and it outperforms competition-winning Boolean satisfiability solvers in approximate optimization. Moreover, our architecture can find the correct ground state, even when inexact sampling is made with faster clocks. Our problem encoding and sparsification techniques could be applied to other classical and quantum Ising machines, and our architecture could potentially be scaled to 1,000,000 or more p-bits using analogue silicon or nanodevice technologies. Sparsification techniques can be used to create Ising machines prototyped on field-programmable gate arrays that can quickly and efficiently solve combinatorial optimization problems.
Article
Full-text available
Probabilistic bits (p-bits) have recently been presented as a spin (basic computing element) for the simulated annealing (SA) of Ising models. In this brief, we introduce fast-converging SA based on p-bits designed using integral stochastic computing. The stochastic implementation approximates a p-bit function, which can search for a solution to a combinatorial optimization problem at lower energy than conventional p-bits. Searching around the global minimum energy can increase the probability of finding a solution. The proposed stochastic computing-based SA method is compared with conventional SA and quantum annealing (QA) with a D-Wave Two quantum annealer on the traveling salesman, maximum cut (MAX-CUT), and graph isomorphism (GI) problems. The proposed method achieves a convergence speed a few orders of magnitude faster while dealing with an order of magnitude larger number of spins than the other methods.
Article
Full-text available
Simulated quantum annealing (SQA) is a probabilistic approximation method to find a solution for a combinatorial optimization problem using a digital computer. It is possible to simulate large-scale optimization problems on a CPU due to its high external memory capacity. However, the processing time increases exponentially with the number of variables, and parallel implementation is difficult due to the serial nature of the quantum Monte Carlo algorithm used in SQA. In this paper, we propose a method to accelerate SQA on a multicore CPU, based on temporal and spatial parallel processing with high data localization. According to the experimental results using 16-core CPU, we achieved from 8 to 16 times speedup compared to single-core CPU implementations. The proposed method can be used to solve combinatorial optimization problems that have more than 64,000 variables, which was not possible using previous GPU- and FPGA-based accelerators.
Article
Full-text available
Digital computers store information in the form of bits that can take on one of two values 0 and 1, while quantum computers are based on qubits that are described by a complex wavefunction, whose squared magnitude gives the probability of measuring either 0 or 1. Here, we make the case for a probabilistic computer based on p-bits, which take on values 0 and 1 with controlled probabilities and can be implemented with specialized compact energy-efficient hardware. We propose a generic architecture for such p-computers and emulate systems with thousands of p-bits to show that they can significantly accelerate randomized algorithms used in a wide variety of applications including but not limited to Bayesian networks, optimization, Ising models, and quantum Monte Carlo.
Article
Full-text available
Computer scientists and engineers have started down a road that could one day lead to a momentous transition: from deterministic computing systems, based on classical physics, to quantum computing systems, which exploit the weird and wacky probabilistic rules of quantum physics. Many commentators have pointed out that if engineers are able to fashion practical quantum computers, there will be a tectonic shift in the sort of computations that become possible ¶ But that's a big if. ¶ Quantum computers hold great theoretical promise, sure, but the hurdles that need to be overcome to build practical machines are enormous. Some skeptics have argued that the technical challenges are so immense that it's very unlikely that general-purpose quantum computers will become available anytime in the foreseeable future. Others, including the engineers now working very hard to build these machines at Google, IBM, Intel, and elsewhere, are more sanguine, anticipating that 5 or 10 more years of work may be enough to bring the first practical general-purpose quantum computers on line. Only time will tell. -
Article
Full-text available
In this paper we present a concrete design for a probabilistic (p-) computer based on a network of p-bits, robust classical entities fluctuating between -1 and +1, with probabilities that are controlled through an input constructed from the outputs of other p-bits. The architecture of this probabilistic computer is similar to a stochastic neural network with the p-bit playing the role of a binary stochastic neuron, but with one key difference: there is no sequencer used to enforce an ordering of p-bit updates, as is typically required. Instead, we explore sequencerless designs where all p-bits are allowed to flip autonomously and demonstrate that such designs can allow ultrafast operation unconstrained by available clock speeds without compromising the solution’s fidelity. Based on experimental results from a hardware benchmark of the autonomous design and benchmarked device models, we project that a nanomagnetic implementation can scale to achieve petaflips per second with millions of neurons. A key contribution of this paper is the focus on a hardware metric – flips per second– as a problem and substrate-independent figure-of-merit for an emerging class of hardware annealers known as Ising Machines. Much like the shrinking feature sizes of transistors that have continually driven Moore’s Law, we believe that flips per second can be continually improved in later technology generations of a wide class of probabilistic, domain specific hardware.
Article
Full-text available
The promise of quantum computers is that certain computational tasks might be executed exponentially faster on a quantum processor than on a classical processor¹. A fundamental challenge is to build a high-fidelity processor capable of running quantum algorithms in an exponentially large computational space. Here we report the use of a processor with programmable superconducting qubits2–7 to create quantum states on 53 qubits, corresponding to a computational state-space of dimension 2⁵³ (about 10¹⁶). Measurements from repeated experiments sample the resulting probability distribution, which we verify using classical simulations. Our Sycamore processor takes about 200 seconds to sample one instance of a quantum circuit a million times—our benchmarks currently indicate that the equivalent task for a state-of-the-art classical supercomputer would take approximately 10,000 years. This dramatic increase in speed compared to all known classical algorithms is an experimental realization of quantum supremacy8–14 for this specific computational task, heralding a much-anticipated computing paradigm.
Article
Full-text available
Conventional computers operate deterministically using strings of zeros and ones called bits to represent information in binary code. Despite the evolution of conventional computers into sophisticated machines, there are many classes of problems that they cannot efficiently address, including inference, invertible logic, sampling and optimization, leading to considerable interest in alternative computing schemes. Quantum computing, which uses qubits to represent a superposition of 0 and 1, is expected to perform these tasks efficiently1–3. However, decoherence and the current requirement for cryogenic operation⁴, as well as the limited many-body interactions that can be implemented, pose considerable challenges. Probabilistic computing1,5–7 is another unconventional computation scheme that shares similar concepts with quantum computing but is not limited by the above challenges. The key role is played by a probabilistic bit (a p-bit)—a robust, classical entity fluctuating in time between 0 and 1, which interacts with other p-bits in the same system using principles inspired by neural networks⁸. Here we present a proof-of-concept experiment for probabilistic computing using spintronics technology, and demonstrate integer factorization, an illustrative example of the optimization class of problems addressed by adiabatic⁹ and gated² quantum computing. Nanoscale magnetic tunnel junctions showing stochastic behaviour are developed by modifying market-ready magnetoresistive random-access memory technology10,11 and are used to implement three-terminal p-bits that operate at room temperature. The p-bits are electrically connected to form a functional asynchronous network, to which a modified adiabatic quantum computing algorithm that implements three- and four-body interactions is applied. Factorization of integers up to 945 is demonstrated with this rudimentary asynchronous probabilistic computer using eight correlated p-bits, and the results show good agreement with theoretical predictions, thus providing a potentially scalable hardware approach to the difficult problems of optimization and sampling.
Article
Full-text available
Probabilistic spin logic (PSL) is a recently proposed computing paradigm based on unstable stochastic units called probabilistic bits (p-bits) that can be correlated to form probabilistic circuits (p-circuits). These p-circuits can be used to solve problems of optimization, inference and also to implement precise Boolean functions in an "inverted" mode, where a given Boolean circuit can operate in reverse to find the input combinations that are consistent with a given output. In this paper we present a scalable FPGA implementation of such invertible p-circuits. We implement a "weighted" p-bit that combines stochastic units with localized memory structures. We also present a generalized tile of weighted p-bits to which a large class of problems beyond invertible Boolean logic can be mapped.
Article
Full-text available
Conventional semiconductor-based logic and nanomagnet-based memory devices are built out of stable, deterministic units such as standard metal-oxide semiconductor transistors, or nanomagnets with energy barriers in excess of ≈40–60 kT. In this paper, we show that unstable, stochastic units, which we call “p-bits,” can be interconnected to create robust correlations that implement precise Boolean functions with impressive accuracy, comparable to standard digital circuits. At the same time, they are invertible, a unique property that is absent in standard digital circuits. When operated in the direct mode, the input is clamped, and the network provides the correct output. In the inverted mode, the output is clamped, and the network fluctuates among all possible inputs that are consistent with that output. First, we present a detailed implementation of an invertible gate to bring out the key role of a single three-terminal transistorlike building block to enable the construction of correlated p-bit networks. The results for this specific, CMOS-assisted nanomagnet-based hardware implementation agree well with those from a universal model for p-bits, showing that p-bits need not be magnet based: any three-terminal tunable random bit generator should be suitable. We present a general algorithm for designing a Boltzmann machine (BM) with a symmetric connection matrix [J] (Jij=Jji) that implements a given truth table with p-bits. The [J] matrices are relatively sparse with a few unique weights for convenient hardware implementation. We then show how BM full adders can be interconnected in a partially directed manner (Jij≠Jji) to implement large logic operations such as 32-bit binary addition. Hundreds of stochastic p-bits get precisely correlated such that the correct answer out of 233 (≈8×109) possibilities can be extracted by looking at the statistical mode or majority vote of a number of time samples. With perfect directivity (Jji=0) a small number of samples is enough, while for less directed connections more samples are needed, but even in the former case logical invertibility is largely preserved. This combination of digital accuracy and logical invertibility is enabled by the hybrid design that uses bidirectional BM units to construct circuits with partially directed interunit connections. We establish this key result with extensive examples including a 4-bit multiplier which in inverted mode functions as a factorizer.
Article
Full-text available
The common feature of nearly all logic and memory devices is that they make use of stable units to represent 0's and 1's. A completely different paradigm is based on three-terminal stochastic units which could be called "p-bits", where the output is a random telegraphic signal continuously fluctuating between 0 and 1 with a tunable mean. p-bits can be interconnected to receive weighted contributions from others in a network, and these weighted contributions can be chosen to not only solve problems of optimization and inference but also to implement precise Boolean functions in an inverted mode. This inverted operation of Boolean gates is particularly striking: They provide inputs consistent to a given output along with unique outputs to a given set of inputs. The existing demonstrations of accurate invertible logic are intriguing, but will these striking properties observed in computer simulations carry over to hardware implementations? This paper uses individual micro controllers to emulate p-bits, and we present results for a 4-bit ripple carry adder with 48 p-bits and a 4-bit multiplier with 46 p-bits working in inverted mode as a factorizer. Our results constitute a first step towards implementing p-bits with nano devices, like stochastic Magnetic Tunnel Junctions.
Article
Full-text available
Taking the pulse of optimization Finding the optimum solution of multiparameter or multifunctional problems is important across many disciplines, but it can be computationally intensive. Many such problems defined as computationally difficult can be mathematically mapped onto the so-called Ising problem, which looks at finding the minimum energy configuration for an array of coupled spins. Inagaki et al. and McMahon et al. show that an optical processing approach based on a network of coupled optical pulses in a ring fiber can be used to model and optimize large-scale Ising systems. Such a scalable architecture could help to optimize solutions to a wide range of complex problems. Science , this issue pp. 603 and 614
Article
Full-text available
We provide Ising formulations for many NP-complete and NP-hard problems, including all of Karp's 21 NP-complete problems. This collects and extends classic results relating partitioning problems to Ising spin glasses, as well as work describing exact covering algorithms and satisfiability. In each case, the state space is at most polynomial in the size of the problem, as is the number of terms in the Hamiltonian. This work may be useful in designing adiabatic quantum optimization algorithms.
Article
Recently, hardware accelerators based on the Ising model have gained ever-increasing interest by demonstrating their capabilities of solving complex decision and optimization problems that are intractable using classical computers CPUs/graphics processing units (GPUs). The problems are translated into combinatorial optimization problems (COPs) and mapped to the Ising machine, comprised of artificial spins interacting and naturally finding their optimal states. Recent discrete-time Ising machines operating at room temperatures have demonstrated solving small-scale COPs while consuming orders of magnitude lower energy than prior quantum annealers; however, they have several limitations due to their discrete-time operations, bulky spins, and lack of compact random number generators. In this work, we propose a novel Ising machine with compact latch-based spin circuits operating in a continuous time. The proposed continuous-time Ising machine finds solutions to COPs with fully parallel spin operations (couplings between latches), significantly reducing computing latency and energy consumption. Besides, the latch-based spins randomize or superpose their initial spin states to find better solutions with the lower Ising Hamiltonian (i.e., a key performance indicator (KPI) of the Ising machine) A 0.656 \ttimes 0.680 mm 2^{{2}} test chip with a 40 \ttimes 36 latch-based spin array is fabricated using a 65 nm CMOS process. The proposed continuous-time latch-based spin with equalization (CTLE)-Ising achieves 1000×1000\times speedup compared to the discrete-time Ising machine operating at 1 GHz when solving max-cut COPs while consuming 0.2–3 nJ using 0.75–1.05 V core supply voltage.
Article
A probabilistic bit (p-bit) is the fundamental building block in the circuit of probabilistic computing (PC), and it produces a random binary bitstream with tunable probability. Utilizing the randomness induced by thermal noise-induced lattice vibration in the ferroelectric (FE) material, we propose the p-bits based on stochastic ferroelectric FET (FeFET). The domain dynamic is revealed to play crucial roles in FE p-bits’ stochasticity, as the domain coupling suppresses the dipole fluctuation. The proposed FE p-bits possess the advantages of both extremely low hardware cost and scalability for p-bit circuitry, rendering it a promising candidate for PC.
Article
This brief presents a novel annealing processor (AP) design with 1024 fully-connected spins based on a modified Ising model annealing algorithm for combinatorial optimization problems. By adopting the proposed Turbo code-based interleaved random sequence generator (TCSG) and multi-spin update method, the memory usage is made considerable reduction and multi-spin parallel update is supported. The prototype is implemented using FPGA with the operation frequency of 100 MHz. We tested our design on various G-set problems with an average cut accuracy of 99.19% achieved. The proposed design outperforms the CPU-based method by achieving a max speedup of 1099×1099\times .
Article
Combinatorial optimization problems (COPs) find applications in real-world scientific, industrial, and societal scenarios. Such COPs are computationally NP-hard, and performing an exhaustive brute force search for the optimal solution becomes untenable as the COP size increases. To expedite the COP computation, the Ising model formalism is used, which abstracts spin dynamics in a ferromagnet. The spins are orientated to reach the minimum energy state, representing the optimum COP solution. Previous Ising engine designs utilized dedicated annealing processors or additional digital arithmetic circuits next to the memory bitcells. These custom circuits or processors cannot be repurposed for other applications, incurring significant area and power overhead. In contrast to the prior approaches, this work presents a reconfigurable and scalable compute-within-memory analog approach for Ising computation (called Ising-CIM). This area-efficient approach repurposes existing embedded memory bitcell columns and peripheral circuits to perform analog domain Hamiltonian calculations on the bitlines minimizing area and power overhead significantly. A 13.18-Kb silicon prototype, implemented in a 65-nm CMOS process, demonstrates the Ising-CIM concept and functionality using a 100 ×\times 64 pixel image in a max-cut COP. The Ising-CIM design achieves 48- μm 2\mu \text{m}~^{\mathrm{ 2}} /spin unit spin area and 1091×1091\times speedup in annealing time compared to the CPU.
Article
Ising machines are hardware solvers that aim to find the absolute or approximate ground states of the Ising model. The Ising model is of fundamental computational interest because any problem in the complexity class NP can be formulated as an Ising problem with only polynomial overhead, and thus a scalable Ising machine that outperforms existing standard digital computers could have a huge impact for practical applications. We survey the status of various approaches to constructing Ising machines and explain their underlying operational principles. The types of Ising machines considered here include classical thermal annealers based on technologies such as spintronics, optics, memristors and digital hardware accelerators; dynamical systems solvers implemented with optics and electronics; and superconducting-circuit quantum annealers. We compare and contrast their performance using standard metrics such as the ground-state success probability and time-to-solution, give their scaling relations with problem size, and discuss their strengths and weaknesses. Minimizing the energy of the Ising model is a prototypical combinatorial optimization problem, ubiquitous in our increasingly automated world. This Review surveys Ising machines — special-purpose hardware solvers for this problem — and examines the various operating principles and compares their performance. Dedicated hardware solvers for the Ising model are of great interest, owing to their many potential practical applications and the end of Moore’s law, which motivate alternative computational approaches.Three main computing methods that Ising machines use are classical annealing, quantum annealing and dynamical system evolution. A single machine can operate on the basis of multiple computing approaches.Today, Ising hardware based on classical digital technologies is the best performing for common benchmark problems. However, the performance is problem-dependent, and alternative methods can perform well for particular classes of problems.For particular crafted problem instances, quantum approaches have been observed to have superior performance over classical algorithms, motivating quantum hardware approaches and quantum-inspired classical algorithms.Hybrid quantum–classical and digital–analogue algorithms are promising for future development; they may harness the complementary advantages of both. Dedicated hardware solvers for the Ising model are of great interest, owing to their many potential practical applications and the end of Moore’s law, which motivate alternative computational approaches. Three main computing methods that Ising machines use are classical annealing, quantum annealing and dynamical system evolution. A single machine can operate on the basis of multiple computing approaches. Today, Ising hardware based on classical digital technologies is the best performing for common benchmark problems. However, the performance is problem-dependent, and alternative methods can perform well for particular classes of problems. For particular crafted problem instances, quantum approaches have been observed to have superior performance over classical algorithms, motivating quantum hardware approaches and quantum-inspired classical algorithms. Hybrid quantum–classical and digital–analogue algorithms are promising for future development; they may harness the complementary advantages of both.
Article
Lattice-based cryptography (LBC) has emerged as the most viable substitutes to the classical cryptographic schemes as 5 out of 7 finalist schemes in the 3rd round of the NIST post-quantum cryptography (PQC) standardization process are lattice based in construction. This work explores novel architectural optimizations in the FPGA-based hardware implementation of polynomial multiplication, which is a bottleneck in every LBC construction. To target ultra-high throughput, both schoolbook polynomial multiplication (SPM) and number theoretic transform (NTT) are explored: a completely parallel architecture of an SPM is undertaken while for NTT, radix-2 and radix- 222^2 multi-path delay commutator (MDC) based pipelined architectures are adopted. Our proposed high-speed SPM (HSPM) structure on latest Xilinx UltraScale+ FPGA is 5× faster than the state-of-the-art LBC designs. Whereas, the proposed high-speed NTT (HNTT) structure (i.e., R 222^2 MDC) takes only 0.63 μ\mu s for the encryption, hence achieving the highest throughput of 408 Mbps. Moreover, all of the proposed designs achieve highest design efficiencies (i.e., throughput per slice (TPS)) in comparison to available LBC designs.
Article
In VLSI physical design, many algorithms require the solution of difficult combinatorial optimization problems such as max/min-cut, max-flow problems etc. Due to the vast number of elements typically found in this problem domain, these problems are computationally intractable leading to the use of approximate solutions. In this work, we explore the Ising spin glass model as a solution methodology for hard combinatorial optimization problems using the general purpose GPU (GPGPU). The Ising model is a mathematical model of ferromagnetism in statistical mechanics. Ising computing finds a minimum energy state for the Ising model which essentially corresponds to the expected optimal solution of the original problem. Many combinatorial optimization problems can be mapped into the Ising model. In our work, we focus on the max-cut problem as it is relevant to many VLSI design automation problems. Our method is inspired by the observation that Ising annealing process is very amenable to fine-grain massive parallel GPU computing. We will illustrate how the natural randomness of GPU thread scheduling can be exploited during the annealing process to create random update patterns and allow better GPU resource utilization. Furthermore, the proposed GPU-based Ising computing can handle any general Ising graph with arbitrary connections, which was shown to be difficult for existing FPGA and other hardware based implementation methods. Numerical results show that the proposed GPU Ising max-cut solver can deliver more than 2000X speedup over the CPU version of the algorithm on some large examples, which shows huge performance improvement for addressing many hard optimization algorithms for solving practical VLSI design automation problems.
Article
We introduce the concept of a probabilistic or p-bit, intermediate between the standard bits of digital electronics and the emerging q-bits of quantum computing. We show that low barrier magnets or LBMs provide a natural physical representation for p-bits and can be built either from perpendicular magnets designed to be close to the in-plane transition or from circular in-plane magnets. Magnetic tunnel junctions (MTJs) built using LBMs as free layers can be combined with standard NMOS transistors to provide three-terminal building blocks for large scale probabilistic circuits that can be designed to perform useful functions. Interestingly, this three-terminal unit looks just like the 1T/MTJ device used in embedded magnetic random access memory technology, with only one difference: the use of an LBM for the MTJ free layer. We hope that the concept of p-bits and p-circuits will help open up new application spaces for this emerging technology. However, a p-bit need not involve an MTJ; any fluctuating resistor could be combined with a transistor to implement it, while completely digital implementations using conventional CMOS technology are also possible. The p-bit also provides a conceptual bridge between two active but disjoint fields of research, namely, stochastic machine learning and quantum computing. First, there are the applications that are based on the similarity of a p-bit to the binary stochastic neuron (BSN), a well-known concept in machine learning. Three-terminal p-bits could provide an efficient hardware accelerator for the BSN. Second, there are the applications that are based on the p-bit being like a poor man's q-bit. Initial demonstrations based on full SPICE simulations show that several optimization problems, including quantum annealing are amenable to p-bit implementations which can be scaled up at room temperature using existing technology.
Article
Invertible logic can operate in one of two modes: 1) a forward mode, in which inputs are presented and a single, correct output is produced, and 2) a reverse mode, in which the output is fixed and the inputs take on values consistent with the output. It is possible to create invertible logic using various Boltzmann machine configurations. Such systems have been shown to solve certain challenging problems quickly, such as factorization and combinatorial optimization. In this paper, we show that invertible logic can be implemented using simple spiking neural networks based on stochastic computing. We present a design methodology for invertible stochastic gates, which can be implemented using a small amount of complimentary metal-oxide-semiconductor hardware. We demonstrate that our design can not only correctly implement the basic gates with invertible capability but can also be extended to construct invertible stochastic adder and multiplier circuits. The experimental results are presented, which demonstrate the correct operation of synthesizable invertible circuitry performing both multiplication and factorization, along with fabricated application-specific integrated circuit measurement results for an invertible multiplier circuit.
Article
A digital computer is generally believed to be an efficient universal computing device; that is, it is believed able to simulate any physical computing device with an increase in computation time by at most a polynomial factor. This may not be true when quantum mechanics is taken into consideration. This paper considers factoring integers and finding discrete logarithms, two problems which are generally thought to be hard on a classical computer and which have been used as the basis of several proposed cryptosystems. Efficient randomized algorithms are given for these two problems on a hypothetical quantum computer. These algorithms take a number of steps polynomial in the input size, e.g., the number of digits of the integer to be factored.
UltraScale Architecture: Staying a Generation Ahead with an Extra Node of Value
  • Xilinx Inc
Xilinx Inc., "UltraScale Architecture: Staying a Generation Ahead with an Extra Node of Value," https://www.xilinx.com/products/technology/ ultrascale.html.
Benchmarking the MAX-CUT Problem on the Simulated Bifurcation Machine
  • Y Matsuda
Y. Matsuda, "Benchmarking the MAX-CUT Problem on the Simulated Bifurcation Machine," Medium, 2019, https://medium.com/toshiba-sbm/ benchmarking-the-max-cut-problem-on-the-simulated-bifurcation-machine-e26e1127c0b0.
Table of Linear Feedback Shift Registers
  • R Ward
  • T Molteno
R. Ward and T. Molteno, "Table of Linear Feedback Shift Registers," Department of Physics, University of Otago, Tech. Rep., Oct. 2007.