Zuocheng Xing

Zuocheng Xing
  • PhD
  • National University of Defense Technology

About

83
Publications
8,142
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
619
Citations

Publications

Publications (83)
Article
Full-text available
Automatic modulation classification (AMC) plays a fundamental role in common communication systems. Existing clustering models typically handle fewer modulation types with lower classification accuracies and more computational resources. This paper proposes a hierarchical self-organizing map (SOM) based on a feature space composed of high-order cum...
Article
Full-text available
The sigmoid activation function is popular in neural networks, but its complexity limits the hardware implementation and speed. In this paper, we use curvature values to divide the sigmoid function into different segments and employ the least squares method to solve the expressions of the piecewise linear fitting function in each segment. We then a...
Article
Full-text available
Abstract Sorted QR decomposition (SQRD) has been extensively adopted for various multiple‐input‐multiple‐output (MIMO) detectors, in which the sorting process incurs severe latency when it comes to larger‐scale MIMO situations. This paper proposes a group‐SQRD (GSQRD) algorithm to alleviate the latency problem of general SQRD architectures for larg...
Article
Full-text available
This paper presents a novel parallel quasi-cyclic low-density parity-check (QC-LDPC) encoding algorithm with low complexity, which is compatible with the 5th generation (5G) new radio (NR). Basing on the algorithm, we propose a high area-efficient parallel encoder with compatible architecture. The proposed encoder has the advantages of parallel enc...
Article
Full-text available
Represented by application-specific instruction set processors (ASIPs) and array processors, existing cryptographic processors face challenges in application to mobile terminals with sensitive security requirements. Typically, ASIPs have limited computational efficiency and algorithmic adaptability. An array processor requires massive circuits to p...
Article
Full-text available
The data detector for future wireless system needs to achieve high throughput and low bit error rate (BER) with low computational complexity. In this paper, we propose a deep neural networks (DNNs) learning aided iterative detection algorithm. We first propose a convex optimization-based method for calculating the efficient detection of iterative s...
Article
Full-text available
Automatic modulation classification (AMC) has recently attracted widespread attention nowadays due to its desirable features of generalisability and requirement of little prior knowledge through artificial intelligence (AI) technology. The authors propose a stacked auto‐encoder (SAE) based on various optimisation methods structure to intelligently...
Article
Full-text available
As the first kind of capacity-achieving forward error correction (FEC) codes, polar codes have attracted much research interest recently. Compared with traditional FEC codes, polar codes shows better error correction performance when successive cancellation list (SCL) decoding with cyclic redundancy check is adopted. However, its serial decoding na...
Article
Full-text available
This paper proposes a high-efficient preprocessing algorithm for 16×16 MIMO detections. The proposed algorithm combines a sorting-relaxed QR decomposition (SRQRD) and a modified greedy LLL (MGLLL) algorithm. First, SRQRD is conducted to decompose the channel matrices. This decomposition adopts a relaxed sorting strategy together with a paralleled G...
Conference Paper
By adopting successive cancellation list decoding (SCL), polar codes demonstrate competitive error correction performance over LDPC and Turbo codes. However, SCL decoding suffers from high computational complexity and long decoding latency, especially when the list size is very large. Successive cancellation flip (SCF), as another decoding algorith...
Conference Paper
In Mobile Cyber-physical system (CPS) is a popular research field in recent years. It aims to control and monitor mobile devices in complex and real-time scenes, and provide people with convenience and economy by using intelligent applications. The scenes in mobile CPS have close relationships with everyone's life and it has pervasive effect on peo...
Article
Full-text available
Polar codes have drawn much research attention in the past ten years for their capacity-achieving property. However, their conventional successive cancellation decoding method performs not well at a short or moderate length. In order to improve the performance, concatenation with other error-correction codes has been proved an effective approach, w...
Conference Paper
Full-text available
Polar codes attract more and more attention of researchers in recent years, since its capacity achieving property. However, their error-correction performance under successive cancellation (SC) decoding is inferior to other modern channel codes at short or moderate blocklengths. SC-Flip (SCF) decoding algorithm shows higher performance than SC deco...
Preprint
Polar codes attract more and more attention of researchers in recent years, since its capacity achieving property. However, their error-correction performance under successive cancellation (SC) decoding is inferior to other modern channel codes at short or moderate blocklengths. SC-Flip (SCF) decoding algorithm shows higher performance than SC deco...
Chapter
Polar codes are widely considered as one of the most promising channel codes for future wireless communication. However, at short or moderate block lengths, their error-correction performance under traditional successive cancellation (SC) decoding is inferior to other modern channel codes, while under list decoding outperforms at the cost of high c...
Article
Massive multiple-input multiple-output provides improved energy efficiency and spectral efficiency in 5G. However it requires large-scale matrix computation with tremendous complexity, especially for data detection and precoding. Recently, many detection and precoding methods were proposed using approximate iteration methods, which meet the demand...
Article
As we approach the exascale era in supercomputing, designing a balanced computer system with a powerful computing ability and low power requirements has becoming increasingly important. The graphics processing unit (GPU) is an accelerator used widely in most of recent supercomputers. It adopts a large number of threads to hide a long latency with a...
Chapter
QR decomposition (QRD) is one of the performance bottlenecks of transceiver processor in the multiuser multiple-input-multiple-output (MU-MIMO) systems. This paper proposes a QRD algorithm based on the existing modified Gram-Schmidt (MGS) algorithm and iteration look-ahead MGS (ILMGS) algorithm, which is named modified ILMGS (MILMGS) algorithm. A c...
Article
Graphics processing units (GPUs) are playing more important roles in parallel computing. Using their multi-threadedexecution model, GPUs can accelerate many parallel programmes and save energy. In contrast to their strong computingpower, GPUs have limited on-chip memory space which is easy to be inadequate. The throughput-oriented execution model i...
Article
The fast-evolving standards of the wireless communication systems drive the demand for flexible baseband processing platforms. However, with the proliferation of MIMO technologies, traditional single-core-based solutions are hardly able to fulfill requirements with acceptable power and area cost. The reliance on multi-/many-core system is increasin...
Article
To improve energy efficiency and spectral efficiency, massive multiple-input-multiple-output (MIMO) is proposed and becomes a promising technology in the next generation mobile communication. However, massive MIMO systems equip with scores of or hundreds of antennas which induce large-scale matrix computations with tremendous complexity, especially...
Conference Paper
EXT and EXTU are function instructions based on a chip DSP TMS320C62xx. This paper describes the processes of the circuit hierarchical design method at 130nm logic technology to complete the full-custom circuit design based on EXT, EXTU instructions. The work includes some functional verification in Verilog-level by using NC-Verilog after extractin...
Article
As the need for high performance computing continues to grow, it becomes more and more urgent to design a massive multi-core processor with high throughput and efficiency. However, when the number of cores keeps increasing, the capacity of on-chip memory is always insufficient. In a multi-core processor such as GPGPU (General Purpose Graphic Proces...
Article
QR decomposition (QRD) has been a vital component in the transceiver processor of future multiple-input multiple-output (MIMO) systems, in which antenna configuration will be more and more flexible. Therefore, the QRD hardware architecture in the future MIMO systems should be more flexible to meet various antenna configurations. Unfortunately, the...
Conference Paper
Full-text available
As we are approaching the exascale era in super-computing, designing a balanced computer system with powerful computing ability and low energy consumption becomes increasingly important. GPU is a widely used accelerator in most recently applied supercomputers. It adopts massive mul-tithreads to hide long latency and has high energy efficiency. In c...
Article
Full-text available
An efficient parallel algorithm for Caputo fractional reaction-diffusion equation with implicit finite-difference method is proposed in this paper. The parallel algorithm consists of a parallel solver for linear tridiagonal equations and parallel vector arithmetic operations. For the parallel solver, in order to solve the linear tridiagonal equatio...
Conference Paper
The Synthetic Aperture Radar (SAR) system is a kind of modern high-resolution microwave imaging radar used in all-weather and all day long to provide remote sensing means and generate high resolution images of the land under illumination of radar beam. Unlike optical sensors, SAR algorithm needs a post-processing process on the data acquired to for...
Article
As an attractive interference cancellation (IC) technique, Tomlinson-Harashima precoding (THP) has been investigated thoroughly in theory. Several high performance THP variants have been proposed, e.g., sorted QR decomposition (SQRD), Cholesky decomposition, vertical Bell Laboratories space time (V-BLAST) and lattice reduction aided THPs. From a pr...
Conference Paper
The QR decomposition (QRD) has been extensively adopted in the transceiver processor of Multiple input multiple output orthogonal frequency division multiplexing (MIMO-OFDM) systems. The antenna configuration of future MIMO-OFDM system is very flexible. Therefore, the QRD architecture should also has the flexibility feature to decompose various dim...
Article
QR decomposition is extensively adopted in multiple-input-multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) wireless communication systems, and is one of the performance bottlenecks in lots of high performance wireless communication algorithms. To implement low processing latency QR decomposition with hardware, we propose a nov...
Article
Currently, massive multiple-input multiple-output (MIMO) is one of the most promising wireless transmission technologies for 5G. Massive MIMO requires handling with large-scale matrix computation, especially for matrix inversion. In this letter, we find that matrix inversion based on Newton iteration (NI) is suitable for data detection in massive M...
Conference Paper
This paper addresses the problem of acquiring the sampling frequency offset (SFO) and carrier frequency offset (CFO), which severely degrade the performance of orthogonal frequency division multiplexing (OFDM) system. Using two identical frequency domain (FD) long training symbols in preamble, we propose a novel maximum-likelihood (ML) estimation m...
Conference Paper
Currently 5G is research hotspot in communication field, and one of the most promising wireless transmission technologies for 5G is massive multiple input multiple output (MIMO) which provides high data rate and energy efficiency. The main challenge of massive MIMO is the channel estimation due to the complexity and pilot contamination. Some improv...
Article
QR decomposition (QRD) is one of the performance bottlenecks in lots of high performance wireless communication algorithms, and should has the flexibility property for future MIMO systems. However, the existing QRD architectures only focus on several fixed dimension matrix. The parallel tiled QRD algorithm is a perfect choice to implement QRD for i...
Conference Paper
Monte Carlo (MC) simulation plays an important part in dose calculation for radiotherapy treatment planning. Since the accuracy of MC simulation relies on the number of simulated particles histories, it's very time-consuming. The Intel Many Integrated Core (MIC) architecture, which consists of more than 50 cores and supports many parallel programmi...
Article
Network-on-chip (NoC) is one of critical communication architectures for the scaling of future many-core processors. The challenge for on-chip network is reducing design complexity to save both area and power while providing high performance such as low latency and high throughput. Especially, with increase of network size, both design complexity a...
Article
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In t...
Conference Paper
Many-core system is main architecture trend currently. One of the dominating challenges for on-chip manycore system is the memory wall. However traditional research primarily focus on the limited bandwidth. To solve this problem, many-core system is aided with large cache, and a lot of complex approaches about memory and cache are adopted aiming at...
Article
Full-text available
Network-on-Chip (NoC) is one of critical communication architectures for future many-core systems. As technology is continually scaling down, on-chip network meets the increasing leakage power crisis. As a leakage power mitigation technique, power-gating can be utilized in on-chip network to solve the crisis. However, the network performance is sev...
Conference Paper
The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of the coupling is very time-consuming. As a many-core architecture, the Intel's Many Integrated Core (MIC) architecture owns 512-bit vec...
Article
Power consumption, design complexity and areacost are limiting constraints in the design of interconnect for scalable many-core systems. To tackle the power and area concerns, we propose a light-weight unidirectional channel network-on-chip in 2D mesh topology (UniMESH), which simplifies router architectures, uses only half amount of channel links...
Article
In the paper, a new implementation of a 3GPP LTE standards compliant turbo decoder based on GPGPU is proposed. It uses the newest GPU-Tesla K20c, which is based on the Kepler GK110 architecture. The new architecture has more powerful parallel computing capability and we use it to fully exploit the parallelism in the turbo decoding algorithm in nove...
Article
Full-text available
With the rapid development of integrated circuits [1], low power consumption has become a constant pursuiting goal of the designer in chip design. As the memory almost takes up the area of the chip, reducing memory power consumption will significantly reduce the overall power consumption of the chip; according to ISSCC’s 2014 report about technolog...
Article
Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulati...
Conference Paper
Power-gating is a representative circuit level technique to mitigate leakage power. While in low-power Network-on-Chip (NoC) design, the former fine-grained power-gating methods will decrease network performance due to serial wake-up latency and head-of-line blocking. Therefore, we propose a flexible Virtual Channel (VC) management scheme for fine-...
Article
A novel reconfigurable hybrid single electron transistor/MOSFET (SETMOS) circuit architecture, namely, reconfigurable pseudo-NMOS-like logic is proposed. Based on the hybrid SETMOS inverter/buffer circuit cell, reconfigurable pseudo-NMOS-like logics that can work normally at room temperature are constructed. This kind of reconfigurable logic can im...
Conference Paper
This paper proposes a novel analytical model for semiconductor single-electron transistor (SET) with concrete size coulomb island at room temperature. The number of electrons in island of SET is analyzed when it is odd number or even number, respectively, then a uniform calculation model is gained for the first time. Based on the model, the I-V cha...
Conference Paper
The characteristic of specifically tunable negative differential resistance (NDR) of single-electron transistor (SET) controlled by capacitance which is noted accidentally in our experiment is studied in this paper. Tunable NDR of SET controlled by single source, drain and gate capacitances are simulated, respectively, then it is also done by contr...
Conference Paper
A full adder based on hybrid single-electron transistors (SET) and MOSFETs (SETMOS) at room temperature is proposed in this paper. Because the SET can play the same role as compensatory MOSFETs, we design a fuller adder with hybrid SETMOS. Further more, we simulate the logic element by HSPIC and the simulation result shows that the logic element im...
Conference Paper
The paper proposes a backhaul-route pre-configuration mechanism (BRPCM) for the round-trip communication pattern, which is suited for the backhaul packets traversal. With previous communication patterns, BRPCM pre-configures a converse crossbar connection creating backhaul-route within a single router during the previous flits traversal. Combining...
Conference Paper
As powerful error correcting codes, Low-Density Parity-Check (LDPC) codes have been adopted as a fundamental building block by dirty paper coding (DPC), which indicates that lossless precoding is theoretically possible at any signal-to-noise ratio (SNR), and is a promising strategy in future communication systems. However, to achieve this performan...
Article
Continuing decrease in the feature size of integrated circuits leads to increases in susceptibility to transient and permanent faults. This paper proposes a fault-tolerant solution for a bufferless network-on-chip, including an on-line fault-diagnosis mechanism to detect both transient and permanent faults, a hybrid automatic repeat request, and fo...
Conference Paper
With the development of semiconductor technology and the complexity of chip, in some circumstance, the cost of repeated design and manufacture to modify the design mistakes is almost as large as initial design. The cache is an essential part of the processor, in order to ensure the correctness of the design and reduce the cost of debugging cache, t...
Article
This paper proposes a Reduced Explicitly Parallel Instruction Computing Processor (REPICP) which is an independently designed, 64-bit, general-purpose microprocessor. The REPICP based on EPIC architecture overcomes the disadvantages of hardware-based superscalar and software-based Very Long Instruction Word (VLIW) and utilizes the cooperation of co...
Article
According to the migratory pattern means that the accessing processor initiates two separate requests to obtain first read and then write permission in invalidation-based protocol, this paper proposed adaptive protocol which uses the token number and the writer or reader of data to recognize the migratory pattern. While the data is in migratory pat...
Article
We proposed a straight-forwarding route pre-configuration (SFRP) router architecture for the communication spatial locality when packets traverse under dimension-ordered routing mode, which was adapted to the latency optimization for the packets straight forwarding traversal. In our SFRP router, a corresponding straight-forwarding route was preconf...
Conference Paper
Despite many years of effort, the precise mechanism of negative differential resistance (NDR) of single-electron transistor (SET) remains unclear, and this lack of knowledge has become a major obstacle in the research and development of new electronic devices to make use this effect. This paper proposes a conductance model to validate and analysis...
Conference Paper
With the development of information systems, electronic devices are becoming more and more susceptible to soft errors, especially for the tough environment of drastic electromagnetic interference. Architectural Vulnerability Factor (AVF), which is defined as the fraction of soft errors that result in erroneous outputs, has been introduced to quanti...
Article
Single-electronic transistor (SET) are considered as the attractive candidates for post-COMS VLSI due to their ultra-small size and low power consumption. Along with the size of coulomb island become smaller and smaller, the energy quantization of single electron transistor based on charge state come forth and from obviously to more obviously. A qu...
Conference Paper
Cache coherence protocols play an important role in maintaining data coherence in shared-memory multiprocessor. Token protocol provides a flexible framework for designing new coherence protocols. It features in both attributes: low-latency cache misses and no reliance on totally-ordered inter-connects. However, messages in token protocol are always...
Conference Paper
With the process scaling down to 65nm, leakage power begin to dominate the power consumption. Power gating as one of the most effective techniques to reduce leakage power has become the research hot spot in the past ten years, and will be applicable in all the future low power design. This paper give a brief summary of the concept of power gating d...
Conference Paper
With the increasing number of processor cores in chip multi-processors (CMPs), 2D Mesh has been gaining wide acceptance for inter-core on-chip communication. Program performance is more sensitive to the router latency than to the link bandwidth. This paper presents a low latency Dynamic Virtual Output Queues Router (DVOQR), which can reduce the rou...
Conference Paper
Full-text available
In this paper, we introduce a scalable macro-pipelined architecture to perform floating point matrix multiplication, which aims to exploit temporal parallelism and architectural scalability. We demonstrate the functionality of the hardware design with 16 processing elements (PEs) on Xilinx ML507 development board containing Virtex-5 XC5VFX70T. A 32...
Article
In general, CPU may upgrade its frequency by the PLL (phase-locked loop), but the cost of testing is very expensive for high frequency signals. This paper introduces the method that inserts test logics in the CPU to implement its PLL performance testing. It is very easy to implement, and reduces effectively test costs in the case of low hardware ov...
Conference Paper
Recently GPU is widely utilized in scientific computing and engineering applications, owing primarily to the evolution of GPU architecture. Firstly, we analyze some key performance characters of GPU in detail, and the relationships among GPU architecture, programming model and memory hierarchy. Secondly, we present three performance optimization st...
Conference Paper
With the development of integrated circuit technology, it is more difficult to test and debug. Usually, design for testability (DFT) and debugging structure are made separately in VLSI, which need a great deal of additional hardware resource. This paper introduces a testing structure designed in microprocessor based on JTAG, which is based on scan-...
Article
The stream architecture is a novel microprocessor architecture with wide application potential. It is critical to study how to use the stream architecture to accelerate scientific computing programs. However, existing stream processors and stream programming languages are not designed for scientific computing. To address this issue, we design and i...
Conference Paper
Stream architecture is a novel microprocessor architecture with wide application potential. But as for whether it can be used efficiently in scientific computing, many issues await further study. This paper first gives the design and implementation of a 64-bit stream processor, FT64 (Fei Teng 64), for scientific computing. The carrying out of 64-bi...
Article
This paper presents an instruction Optimized Lock-Step execution Model (OLSM) and builds a memory hierarchy which can embody the essence of this model. In OLSM EPIC microprocessor, instruction can be executed out-of-order, so shortcoming of tradition VLIW lock-step execution is resolved. OLSM can make use of the abundance computation and memory res...
Conference Paper
Leakage power will exceed dynamic power in microprocessor as feature size shrinks, especially for on-chip caches. Besides developing low leakage process and circuit, how to control the leakage power in architectural level is worth to be studied. In this paper, a PDSR (Periodically Drowsy Speculatively Recover) algorithm and its extended version wit...
Article
The leakage power issue is challenging high-performance microprocessor design, especially as feature size shrinks. Not only are low leakage technologies and circuits well researched, but also architectural control methods are studied. Caches represent a sizable fraction of the total power consumption, so they need to be managed firstly. LRU is the...
Article
In this paper, a Hardware Virtual Interface Architecture (HVIA) is proposed, a System Area Network based on HVIA (HVIA-Net)is introduced from the view of hardware design, and the real test in 33 MHz, 64bit PCI environment is presented. The performance of HVIA-Net is compared with current high performance networks. In the end, a framework of new ver...
Article
The virtual memory is a staple in modern processor system. In virtual addressing scheme, the translation from virtual address to physical address is one of the highest frequency core service in the pipe line, and tends to be on the critical path determining the clock cycle of the processor. In order to speed up the address translation, the most mod...

Network

Cited By