
Zuocheng Xing- PhD
- National University of Defense Technology
Zuocheng Xing
- PhD
- National University of Defense Technology
About
83
Publications
8,142
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
619
Citations
Current institution
Publications
Publications (83)
Automatic modulation classification (AMC) plays a fundamental role in common communication systems. Existing clustering models typically handle fewer modulation types with lower classification accuracies and more computational resources. This paper proposes a hierarchical self-organizing map (SOM) based on a feature space composed of high-order cum...
The sigmoid activation function is popular in neural networks, but its complexity limits the hardware implementation and speed. In this paper, we use curvature values to divide the sigmoid function into different segments and employ the least squares method to solve the expressions of the piecewise linear fitting function in each segment. We then a...
Abstract Sorted QR decomposition (SQRD) has been extensively adopted for various multiple‐input‐multiple‐output (MIMO) detectors, in which the sorting process incurs severe latency when it comes to larger‐scale MIMO situations. This paper proposes a group‐SQRD (GSQRD) algorithm to alleviate the latency problem of general SQRD architectures for larg...
This paper presents a novel parallel quasi-cyclic low-density parity-check (QC-LDPC) encoding algorithm with low complexity, which is compatible with the 5th generation (5G) new radio (NR). Basing on the algorithm, we propose a high area-efficient parallel encoder with compatible architecture. The proposed encoder has the advantages of parallel enc...
Represented by application-specific instruction set processors (ASIPs) and array processors, existing cryptographic processors face challenges in application to mobile terminals with sensitive security requirements. Typically, ASIPs have limited computational efficiency and algorithmic adaptability. An array processor requires massive circuits to p...
The data detector for future wireless system needs to achieve high throughput and low bit error rate (BER) with low computational complexity. In this paper, we propose a deep neural networks (DNNs) learning aided iterative detection algorithm. We first propose a convex optimization-based method for calculating the efficient detection of iterative s...
Automatic modulation classification (AMC) has recently attracted widespread attention nowadays due to its desirable features of generalisability and requirement of little prior knowledge through artificial intelligence (AI) technology. The authors propose a stacked auto‐encoder (SAE) based on various optimisation methods structure to intelligently...
As the first kind of capacity-achieving forward error correction (FEC) codes, polar codes have attracted much research interest recently. Compared with traditional FEC codes, polar codes shows better error correction performance when successive cancellation list (SCL) decoding with cyclic redundancy check is adopted. However, its serial decoding na...
This paper proposes a high-efficient preprocessing algorithm for 16×16 MIMO detections. The proposed algorithm combines a sorting-relaxed QR decomposition (SRQRD) and a modified greedy LLL (MGLLL) algorithm. First, SRQRD is conducted to decompose the channel matrices. This decomposition adopts a relaxed sorting strategy together with a paralleled G...
By adopting successive cancellation list decoding (SCL), polar codes demonstrate competitive error correction performance over LDPC and Turbo codes. However, SCL decoding suffers from high computational complexity and long decoding latency, especially when the list size is very large. Successive cancellation flip (SCF), as another decoding algorith...
In Mobile Cyber-physical system (CPS) is a popular research field in recent years. It aims to control and monitor mobile devices in complex and real-time scenes, and provide people with convenience and economy by using intelligent applications. The scenes in mobile CPS have close relationships with everyone's life and it has pervasive effect on peo...
Polar codes have drawn much research attention in the past ten years for their capacity-achieving property. However, their conventional successive cancellation decoding method performs not well at a short or moderate length. In order to improve the performance, concatenation with other error-correction codes has been proved an effective approach, w...
Polar codes attract more and more attention of researchers in recent years, since its capacity achieving property. However, their error-correction performance under successive cancellation (SC) decoding is inferior to other modern channel codes at short or moderate blocklengths. SC-Flip (SCF) decoding algorithm shows higher performance than SC deco...
Polar codes attract more and more attention of researchers in recent years, since its capacity achieving property. However, their error-correction performance under successive cancellation (SC) decoding is inferior to other modern channel codes at short or moderate blocklengths. SC-Flip (SCF) decoding algorithm shows higher performance than SC deco...
Polar codes are widely considered as one of the most promising channel codes for future wireless communication. However, at short or moderate block lengths, their error-correction performance under traditional successive cancellation (SC) decoding is inferior to other modern channel codes, while under list decoding outperforms at the cost of high c...
Massive multiple-input multiple-output provides improved energy efficiency and spectral efficiency in 5G. However it requires large-scale matrix computation with tremendous complexity, especially for data detection and precoding. Recently, many detection and precoding methods were proposed using approximate iteration methods, which meet the demand...
As we approach the exascale era in supercomputing, designing a balanced computer system with a powerful computing ability and low power requirements has becoming increasingly important. The graphics processing unit (GPU) is an accelerator used widely in most of recent supercomputers. It adopts a large number of threads to hide a long latency with a...
QR decomposition (QRD) is one of the performance bottlenecks of transceiver processor in the multiuser multiple-input-multiple-output (MU-MIMO) systems. This paper proposes a QRD algorithm based on the existing modified Gram-Schmidt (MGS) algorithm and iteration look-ahead MGS (ILMGS) algorithm, which is named modified ILMGS (MILMGS) algorithm. A c...
Graphics processing units (GPUs) are playing more important roles in parallel computing. Using their multi-threadedexecution model, GPUs can accelerate many parallel programmes and save energy. In contrast to their strong computingpower, GPUs have limited on-chip memory space which is easy to be inadequate. The throughput-oriented execution model i...
The fast-evolving standards of the wireless communication systems drive the demand for flexible baseband processing platforms. However, with the proliferation of MIMO technologies, traditional single-core-based solutions are hardly able to fulfill requirements with acceptable power and area cost. The reliance on multi-/many-core system is increasin...
To improve energy efficiency and spectral efficiency, massive multiple-input-multiple-output (MIMO) is proposed and becomes a promising technology in the next generation mobile communication. However, massive MIMO systems equip with scores of or hundreds of antennas which induce large-scale matrix computations with tremendous complexity, especially...
EXT and EXTU are function instructions based on a chip DSP TMS320C62xx. This paper describes the processes of the circuit hierarchical design method at 130nm logic technology to complete the full-custom circuit design based on EXT, EXTU instructions. The work includes some functional verification in Verilog-level by using NC-Verilog after extractin...
As the need for high performance computing continues to grow, it becomes more and more urgent to design a massive multi-core processor with high throughput and efficiency. However, when the number of cores keeps increasing, the capacity of on-chip memory is always insufficient. In a multi-core processor such as GPGPU (General Purpose Graphic Proces...
QR decomposition (QRD) has been a vital component in the transceiver processor of future multiple-input multiple-output (MIMO) systems, in which antenna configuration will be more and more flexible. Therefore, the QRD hardware architecture in the future MIMO systems should be more flexible to meet various antenna configurations. Unfortunately, the...
As we are approaching the exascale era in super-computing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive mul-tithreads to hide long latency and has high energy efficiency. In c...
An efficient parallel algorithm for Caputo fractional reaction-diffusion equation with implicit finite-difference method is proposed in this paper. The parallel algorithm consists of a parallel solver for linear tridiagonal equations and parallel vector arithmetic operations. For the parallel solver, in order to solve the linear tridiagonal equatio...
The Synthetic Aperture Radar (SAR) system is a kind of modern high-resolution microwave imaging radar used in all-weather and all day long to provide remote sensing means and generate high resolution images of the land under illumination of radar beam. Unlike optical sensors, SAR algorithm needs a post-processing process on the data acquired to for...
As an attractive interference cancellation (IC) technique, Tomlinson-Harashima precoding (THP) has been investigated thoroughly in theory. Several high performance THP variants have been proposed, e.g., sorted QR decomposition (SQRD), Cholesky decomposition, vertical Bell Laboratories space time (V-BLAST) and lattice reduction aided THPs. From a pr...
The QR decomposition (QRD) has been extensively adopted in the transceiver processor of Multiple input multiple output orthogonal frequency division multiplexing (MIMO-OFDM) systems. The antenna configuration of future MIMO-OFDM system is very flexible. Therefore, the QRD architecture should also has the flexibility feature to decompose various dim...
QR decomposition is extensively adopted in multiple-input-multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) wireless communication systems, and is one of the performance bottlenecks in lots of high performance wireless communication algorithms. To implement low processing latency QR decomposition with hardware, we propose a nov...
Currently, massive multiple-input multiple-output (MIMO) is one of the most promising wireless transmission technologies for 5G. Massive MIMO requires handling with large-scale matrix computation, especially for matrix inversion. In this letter, we find that matrix inversion based on Newton iteration (NI) is suitable for data detection in massive M...
This paper addresses the problem of acquiring the sampling frequency offset (SFO) and carrier frequency offset (CFO), which severely degrade the performance of orthogonal frequency division multiplexing (OFDM) system. Using two identical frequency domain (FD) long training symbols in preamble, we propose a novel maximum-likelihood (ML) estimation m...
Currently 5G is research hotspot in communication field, and one of the most promising wireless transmission technologies for 5G is massive multiple input multiple output (MIMO) which provides high data rate and energy efficiency. The main challenge of massive MIMO is the channel estimation due to the complexity and pilot contamination. Some improv...
QR decomposition (QRD) is one of the performance bottlenecks in lots of high performance wireless communication algorithms, and should has the flexibility property for future MIMO systems. However, the existing QRD architectures only focus on several fixed dimension matrix. The parallel tiled QRD algorithm is a perfect choice to implement QRD for i...
Monte Carlo (MC) simulation plays an important part in dose calculation for radiotherapy treatment planning. Since the accuracy of MC simulation relies on the number of simulated particles histories, it's very time-consuming. The Intel Many Integrated Core (MIC) architecture, which consists of more than 50 cores and supports many parallel programmi...
Network-on-chip (NoC) is one of critical communication architectures for the scaling of future many-core processors. The challenge for on-chip network is reducing design complexity to save both area and power while providing high performance such as low latency and high throughput. Especially, with increase of network size, both design complexity a...
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In t...
Many-core system is main architecture trend currently. One of the dominating challenges for on-chip manycore system is the memory wall. However traditional research primarily focus on the limited bandwidth. To solve this problem, many-core system is aided with large cache, and a lot of complex approaches about memory and cache are adopted aiming at...
Network-on-Chip (NoC) is one of critical communication architectures for future many-core systems. As technology is continually scaling down, on-chip network meets the increasing leakage power crisis. As a leakage power mitigation technique, power-gating can be utilized in on-chip network to solve the crisis. However, the network performance is sev...
The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of the coupling is very time-consuming. As a many-core architecture, the Intel's Many Integrated Core (MIC) architecture owns 512-bit vec...
Power consumption, design complexity and areacost are limiting constraints in the design of interconnect for scalable many-core systems. To tackle the power and area concerns, we propose a light-weight unidirectional channel network-on-chip in 2D mesh topology (UniMESH), which simplifies router architectures, uses only half amount of channel links...
In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture has more powerful parallel computing capability and we use it to fully exploit the parallelism in the turbo decoding algorithm in nove...
With the rapid development of integrated circuits [1], low power consumption has become a constant pursuiting goal of the designer in chip design. As the memory almost takes up the area of the chip, reducing memory power consumption will significantly reduce the overall power consumption of the chip; according to ISSCC’s 2014 report about technolog...
Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulati...
Power-gating is a representative circuit level technique to mitigate leakage power. While in low-power Network-on-Chip (NoC) design, the former fine-grained power-gating methods will decrease network performance due to serial wake-up latency and head-of-line blocking. Therefore, we propose a flexible Virtual Channel (VC) management scheme for fine-...
A novel reconfigurable hybrid single electron transistor/MOSFET (SETMOS) circuit architecture, namely, reconfigurable pseudo-NMOS-like logic is proposed. Based on the hybrid SETMOS inverter/buffer circuit cell, reconfigurable pseudo-NMOS-like logics that can work normally at room temperature are constructed. This kind of reconfigurable logic can im...
This paper proposes a novel analytical model for semiconductor single-electron transistor (SET) with concrete size coulomb island at room temperature. The number of electrons in island of SET is analyzed when it is odd number or even number, respectively, then a uniform calculation model is gained for the first time. Based on the model, the I-V cha...
The characteristic of specifically tunable negative differential resistance (NDR) of single-electron transistor (SET) controlled by capacitance which is noted accidentally in our experiment is studied in this paper. Tunable NDR of SET controlled by single source, drain and gate capacitances are simulated, respectively, then it is also done by contr...
A full adder based on hybrid single-electron transistors (SET) and MOSFETs (SETMOS) at room temperature is proposed in this paper. Because the SET can play the same role as compensatory MOSFETs, we design a fuller adder with hybrid SETMOS. Further more, we simulate the logic element by HSPIC and the simulation result shows that the logic element im...
The paper proposes a backhaul-route pre-configuration mechanism (BRPCM) for the round-trip communication pattern, which is suited for the backhaul packets traversal. With previous communication patterns, BRPCM pre-configures a converse crossbar connection creating backhaul-route within a single router during the previous flits traversal. Combining...
As powerful error correcting codes, Low-Density Parity-Check (LDPC) codes have been adopted as a fundamental building block by dirty paper coding (DPC), which indicates that lossless precoding is theoretically possible at any signal-to-noise ratio (SNR), and is a promising strategy in future communication systems. However, to achieve this performan...
Continuing decrease in the feature size of integrated circuits leads to increases in susceptibility to transient and permanent faults. This paper proposes a fault-tolerant solution for a bufferless network-on-chip, including an on-line fault-diagnosis mechanism to detect both transient and permanent faults, a hybrid automatic repeat request, and fo...
With the development of semiconductor technology and the complexity of chip, in some circumstance, the cost of repeated design and manufacture to modify the design mistakes is almost as large as initial design. The cache is an essential part of the processor, in order to ensure the correctness of the design and reduce the cost of debugging cache, t...
This paper proposes a Reduced Explicitly Parallel Instruction Computing Processor (REPICP) which is an independently designed, 64-bit, general-purpose microprocessor. The REPICP based on EPIC architecture overcomes the disadvantages of hardware-based superscalar and software-based Very Long Instruction Word (VLIW) and utilizes the cooperation of co...
According to the migratory pattern means that the accessing processor initiates two separate requests to obtain first read and then write permission in invalidation-based protocol, this paper proposed adaptive protocol which uses the token number and the writer or reader of data to recognize the migratory pattern. While the data is in migratory pat...
We proposed a straight-forwarding route pre-configuration (SFRP) router architecture for the communication spatial locality when packets traverse under dimension-ordered routing mode, which was adapted to the latency optimization for the packets straight forwarding traversal. In our SFRP router, a corresponding straight-forwarding route was preconf...
Despite many years of effort, the precise mechanism of negative differential resistance (NDR) of single-electron transistor (SET) remains unclear, and this lack of knowledge has become a major obstacle in the research and development of new electronic devices to make use this effect. This paper proposes a conductance model to validate and analysis...
With the development of information systems, electronic devices are becoming more and more susceptible to soft errors, especially for the tough environment of drastic electromagnetic interference. Architectural Vulnerability Factor (AVF), which is defined as the fraction of soft errors that result in erroneous outputs, has been introduced to quanti...
Single-electronic transistor (SET) are considered as the attractive candidates for post-COMS VLSI due to their ultra-small size and low power consumption. Along with the size of coulomb island become smaller and smaller, the energy quantization of single electron transistor based on charge state come forth and from obviously to more obviously. A qu...
Cache coherence protocols play an important role in maintaining data coherence in shared-memory multiprocessor. Token protocol provides a flexible framework for designing new coherence protocols. It features in both attributes: low-latency cache misses and no reliance on totally-ordered inter-connects. However, messages in token protocol are always...
With the process scaling down to 65nm, leakage power begin to dominate the power consumption. Power gating as one of the most effective techniques to reduce leakage power has become the research hot spot in the past ten years, and will be applicable in all the future low power design. This paper give a brief summary of the concept of power gating d...
With the increasing number of processor cores in chip multi-processors (CMPs), 2D Mesh has been gaining wide acceptance for inter-core on-chip communication. Program performance is more sensitive to the router latency than to the link bandwidth. This paper presents a low latency Dynamic Virtual Output Queues Router (DVOQR), which can reduce the rou...
In this paper, we introduce a scalable macro-pipelined architecture to perform floating point matrix multiplication, which aims to exploit temporal parallelism and architectural scalability. We demonstrate the functionality of the hardware design with 16 processing elements (PEs) on Xilinx ML507 development board containing Virtex-5 XC5VFX70T. A 32...
In general, CPU may upgrade its frequency by the PLL (phase-locked loop), but the cost of testing is very expensive for high frequency signals. This paper introduces the method that inserts test logics in the CPU to implement its PLL performance testing. It is very easy to implement, and reduces effectively test costs in the case of low hardware ov...
Recently GPU is widely utilized in scientific computing and engineering applications, owing primarily to the evolution of
GPU architecture. Firstly, we analyze some key performance characters of GPU in detail, and the relationships among GPU architecture,
programming model and memory hierarchy. Secondly, we present three performance optimization st...
With the development of integrated circuit technology, it is more difficult to test and debug. Usually, design for testability (DFT) and debugging structure are made separately in VLSI, which need a great deal of additional hardware resource. This paper introduces a testing structure designed in microprocessor based on JTAG, which is based on scan-...
The stream architecture is a novel microprocessor architecture with wide application potential. It is critical to study how to use the stream architecture to accelerate scientific computing programs. However, existing stream processors and stream programming languages are not designed for scientific computing. To address this issue, we design and i...
Stream architecture is a novel microprocessor architecture with wide application potential. But as for whether it can be used efficiently in scientific computing, many issues await further study. This paper first gives the design and implementation of a 64-bit stream processor, FT64 (Fei Teng 64), for scientific computing. The carrying out of 64-bi...
This paper presents an instruction Optimized Lock-Step execution Model (OLSM) and builds a memory hierarchy which can embody the essence of this model. In OLSM EPIC microprocessor, instruction can be executed out-of-order, so shortcoming of tradition VLIW lock-step execution is resolved. OLSM can make use of the abundance computation and memory res...
Leakage power will exceed dynamic power in microprocessor as feature size shrinks, especially for on-chip caches. Besides
developing low leakage process and circuit, how to control the leakage power in architectural level is worth to be studied.
In this paper, a PDSR (Periodically Drowsy Speculatively Recover) algorithm and its extended version wit...
The leakage power issue is challenging high-performance microprocessor design, especially as feature size shrinks. Not only are low leakage technologies and circuits well researched, but also architectural control methods are studied. Caches represent a sizable fraction of the total power consumption, so they need to be managed firstly. LRU is the...
In this paper, a Hardware Virtual Interface Architecture (HVIA) is proposed, a System Area Network based on HVIA (HVIA-Net)is introduced from the view of hardware design, and the real test in 33 MHz, 64bit PCI environment is presented. The performance of HVIA-Net is compared with current high performance networks. In the end, a framework of new ver...
The virtual memory is a staple in modern processor system. In virtual addressing scheme, the translation from virtual address to physical address is one of the highest frequency core service in the pipe line, and tends to be on the critical path determining the clock cycle of the processor. In order to speed up the address translation, the most mod...