Chunye Gong

Chunye Gong
National University of Defense Technology | NUDT · Department of Computer Science and Technology

PhD

About

82
Publications
18,097
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,372
Citations
Additional affiliations
January 2012 - July 2015
National University of Defense Technology
Position
  • changsha

Publications

Publications (82)
Article
Full-text available
The fractional reaction-diffusion equations play an important role in dynamical systems. Indeed, it is time consuming to numerically solve differential fractional diffusion equations. In this paper, we present a parallel algorithm for the Riesz space fractional diffusion equation. The parallel algorithm, which is implemented with MPI parallel progr...
Article
Full-text available
The computational complexity of Caputo fractional reaction–diffusion equation is \(O(MN^2)\) compared with \(O(MN)\) of traditional reaction–diffusion equation, where \(M\), \(N\) are the number of time steps and grid points. A efficient parallel solution for Caputo fractional reaction–diffusion equation with explicit difference method is proposed....
Article
Graphics Processing Unit (GPU), originally developed for real-time, high-definition D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation i...
Article
Full-text available
We present a survey of fractional differential equations and in particular of the computational cost for their numerical solutions from the view of computer science. The computational complexities of time fractional, space fractional, and space-time fractional equations are O(N2M), O(NM2), and O(NM(M + N)) compared with O(MN) for the classical part...
Article
Parallel computing is a useful technology for scientific and engineering algorithms/applications. LU-SGS (lower-upper Symmetric-Gauss-Seidel method) is an efficient and robust scheme for CFD (Computational fluid dynamics) and has strong data dependence in its computation. In this paper, we present an efficient wavefront parallel algorithm for 3D (t...
Article
Full-text available
Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields. As modern reduced instruction set computing architectures like the fifth generation (RISC-V) regard multiple registers as a vector register group for wide instruction parallelism, optimizing merge sort with this vectorized property is becoming increasingl...
Article
Full-text available
Password-based recovery is a widely used method for regaining access to applications or services when passwords are lost or forgotten. It is commonly used in electronic forensics by law enforcement agencies, information acquisition in the commercial sector, and data recovery for individuals. However, as encryption algorithms and complex passwords b...
Preprint
Full-text available
Mesh smoothing methods can enhance mesh quality by eliminating distorted elements, leading to improved convergence in simulations. To balance the efficiency and robustness of traditional mesh smoothing process, previous approaches have employed supervised learning and reinforcement learning to train intelligent smoothing models. However, these meth...
Article
Full-text available
Magnetotelluric (MT) sounding is a geophysical technique widely utilized in mineral resource surveys, where conductivity and magnetic permeability serve as essential physical parameters for forward modeling and inversion. However, the effects of conductive anisotropy and non-zero magnetic susceptibility are usually ignored. In this study, we presen...
Article
Full-text available
The marine magnetotelluric (MMT) method is a significant tool extensively utilized in offshore studies, including the understanding of the Earth’s tectonics and hydrocarbon exploration. Conductive anisotropy and non-zero magnetic susceptibility are common phenomena observed in the Earth’s subsurface, and MMT forward modeling is the basis of practic...
Preprint
Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields. As modern reduced instruction set computing architectures like the fifth generation (RISC-V) regard multiple registers as a vector register group for wide instruction parallelism, optimizing merge sort with this vectorized property is becoming increasingl...
Preprint
Sorting algorithms are the most extensively researched topics in computer science and serve for numerous practical applications. Although various sorts have been proposed for efficiency, different architectures offer distinct flavors to the implementation of parallel sorting. In this paper, we propose a hybrid vectorized merge sort on ARM NEON, nam...
Article
This paper presents MST, a communication-efficient message library for fast graph traversal on exascale clusters. The key idea is to follow the multi-level network topology to perform topology-aware message aggregation, where small messages are gathered and scattered at each level of domain. To facilitate message aggregation, we equip MST with flex...
Article
With the improvement of security awareness, in order to guarantee information security, more advanced and secure encryption algorithms are applied to Microsoft Office. People also set more complex encryption passwords. However, once the initial password is forgotten, the encrypted information needs to be retrieved. The conventional brute force crac...
Article
Full-text available
Computational fluid dynamics simulation accounts for a large number of workloads in the numerical design optimization of aerodynamics problems. In this paper, we develop AFFNet, an advanced neural network and physics solver coupled framework for accelerating flow field simulations. AFFNet combines the benefits of an attention mechanism, affine tran...
Article
Full-text available
In this paper, we present a novel surface mesh generation approach that splits B-rep geometry models into isotropic triangular meshes based on neural networks and splitting lines. In the first stage, a recursive method is designed to generate plentiful data to train the neural network model offline. In the second stage, the implemented mesh generat...
Article
Computational fluid dynamics (CFD) plays a critical role in many scientific and engineering applications, with aerodynamic design optimization being a primary area of interest. Recently, there has been much interest in using artificial intelligence approaches to accelerate this process. One promising method is the graph convolutional neural network...
Preprint
Full-text available
Mesh generation remains a key technology in many areas where numerical simulations are required. As numerical algorithms become more efficient and computers become more powerful, the percentage of time devoted to mesh generation becomes higher. In this paper, we present an improved structured mesh generation method. The method formulates the meshin...
Article
Full-text available
Evaluating mesh quality prior to performing the computational fluid dynamics (CFD) simulation is an essential step to ensure the acceptable accuracy of cylinder modelling. However, traditional mesh quality indicators are often insufficient since they only check geometric information on individual distorted elements. To yield more accurate results,...
Article
As a theoretically rigorous and accurate method, FEP-ABFE (Free Energy Perturbation-Absolute Binding Free Energy) calculations showed great potential in drug discovery, but its practical application was difficult due to high computational cost. To rapidly discover antiviral drugs targeting SARS-CoV-2 M pro and TMPRSS2, we performed FEP-ABFE–based v...
Article
Full-text available
The quality of the finite element mesh has a considerable effect on the efficiency and accuracy of computational fluid dynamics (CFD) simulations. To ensure the generated mesh is of good quality, many quality metrics have been proposed to assess the generated mesh, such as aspect ratio, skewness, Jacobian ratio, etc. Such metrics, however, are prim...
Article
Full-text available
Mesh generation accounts for a large number of workloads in the numerical analysis. In this paper, we introduce a novel differential method MGNet for structured mesh generation. The proposed method poses the meshing task as an optimization problem. It takes boundary curves as input, employs a well-designed neural network to study the potential mesh...
Article
Evaluating mesh quality before solving is crucially important for error control in the numerical simulation of airfoils. Traditional mesh quality metrics are used to identify distorted mesh elements by analyzing their geometric shape information like angles and edges. However, these metrics fail to recognize numerical errors stemming from quality a...
Preprint
Full-text available
SARS-coronavirus-2 (SARS-CoV2) Omicron variant (B.1.1.529) is of great concern to the world due to multiple mutations that may have an impact on transmissibility and immune evasion. Compared to the wild type (WT), there are 15 mutations in the Omicron receptor-binding domain (RBD), 10 of which are in the receptor-binding motif (RBM), where the host...
Article
Full-text available
Deep neural networks (DNNs) have recently shown great potential in solving partial differential equations (PDEs). The success of neural network-based surrogate models is attributed to their ability to learn a rich set of solution-related features. However, learning DNNs usually involves tedious training iterations to converge and requires a very la...
Chapter
Although in the past few decades, many methods such as heatmap, 3D morphable model (3DMM), and generative adversarial network (GAN), have been used to assist facial landmarks extraction, there is a lack of research on balancing the models’ size and accuracy. Therefore, this paper proposes a landmark detection model based on the ShufflenetV2 module...
Article
Full-text available
This paper develops a multi-physics interface code MC-FLUENT to couple the Monte Carlo code OpenMC with the commercial computational fluid dynamics code ANSYS FLUENT. The implementations and parallel performances of block Gauss–Seidel-type and block Jacobi-type Picard iterative algorithms have been investigated. In addition, this paper introduces t...
Article
An important objective of quality control in CFD pre-processing is the facility to indicate to the engineer the validity of the generated mesh. Existing quality measures mainly focus on the subjective evaluation of the shape information of mesh elements, such as aspect ratio, skewness, and shape regularity, and often ignore mesh distribution detail...
Article
Full-text available
3-D magnetotelluric (MT) forward modeling has always been faced with the problems of high memory requirements and long computing time. In this article, we design a scalable parallel algorithm for 3-D MT finite element modeling in anisotropic media. The parallel algorithm is based on the distributed mesh storage, including multiple parallel granular...
Article
Full-text available
Three-dimensional magnetotelluric modeling algorithm of high accuracy and high efficiency is required for data interpretation and inversion. In this paper, edge-based finite element method with unstructured mesh is used to solve 3D magnetotelluric problem. Two boundary conditions—Dirichlet boundary condition and Neumann boundary condition—are set f...
Article
Full-text available
Sparse matrix–vector multiplication (SpMV) is one of the most indispensable kernels of solving problems in numerous applications, but its performance of SpMV is limited by the need for frequent memory access. Modern processors exploit data-level parallelism to improve the performance using single-instruction multiple data (SIMD). In order to take f...
Article
Full-text available
One of the difficult requirements imposed on high-quality CFD mesh generation has been the ability to evaluate the mesh quality efficiently. Due to the lack of a general and effective evaluating criterion, the current mesh quality evaluation task mainly relies on various quality metrics for the shape of mesh elements, such as angle, radius, edge an...
Conference Paper
User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one need to go beyond the U-I matrix. In this paper, we took a case study of Succinct Representations in...
Conference Paper
One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics Processing Units (GPU) computing to the decoding period. We describe extension work based on Kaldi toolkit for speech recognition research. Our work can support weighted finite-sta...
Article
The mesh deformation method based on radial basis functions (RBF) has many advantages and is widely used. RBF based mesh deformation method mainly has two steps: data reduction and displacement interpolation. The data reduction step includes solving interpolation weight coefficients and searching for the node with the maximum interpolation error. T...
Article
Sparse matrix‐vector multiplication (SpMV) is an essential kernel in sparse linear algebra and has been studied extensively on all modern processor and accelerator architectures. Compressed Sparse Row (CSR) is a frequently used format for sparse matrices storage. However, CSR‐based SpMV has poor performance on processors with vector units. In order...
Article
HPL is a Linpack benchmark package widely used in high-performance computing tests. Customizing the HPL is crucial for a heterogeneous system equipped with CPU and the China accelerator because of the complexity of the China accelerator and the specified interface on matrix multiplication built in the China accelerator. Therefore, it is advisable t...
Article
Full-text available
In the numerical approximation of fractional order derivatives, the crucial point is to balance the computing complexity and the computing accuracy. We proposed a piecewise memory principle for fractional derivatives, in which the past history is divided into several segments instead of discarded. The piecewise approximation is performed on each se...
Article
Sweep scheduling methods used in particle transport problems belong to the class of precedence-constrained scheduling problems that are NP-complete. It is difficult to schedule local tasks for this type of transport problem and simultaneously optimize computational performance and parallel processor communication. In this paper, we present a parall...
Article
Moving mesh is widely used in the simulation of aerodynamic shape optimization, multibody relative motion, aircraft icing and aeroelasticity. The efficient and high quality mesh deformation is the key technology of moving mesh. This paper presented a new Mesh Deformation method based on Cartesian Background Mesh (MDCBM). First, the Cartesian backgr...
Article
The computational complexity of the numerical simulation of fractional chaotic system and its synchronization control is compared with O(N) for integer chaotic system, where N is step number and O is the computational complexity. In this paper, we propose optimizing methods to solve fractional chaotic systems, including equal-weight memory principl...
Article
Full-text available
An efficient parallel algorithm for Caputo fractional reaction-diffusion equation with implicit finite-difference method is proposed in this paper. The parallel algorithm consists of a parallel solver for linear tridiagonal equations and parallel vector arithmetic operations. For the parallel solver, in order to solve the linear tridiagonal equatio...
Article
The mesh deformation based on radial basis functions (RBFs) have many advantages, thus it has been widely employed in aerodynamic optimization design as well as other fields. For large-scale meshes or complex configurations, the expense of deforming by RBFs is unbearable. Reducing the number of support points that build the RBFs model provides an a...
Conference Paper
Monte Carlo (MC) simulation plays an important part in dose calculation for radiotherapy treatment planning. Since the accuracy of MC simulation relies on the number of simulated particles histories, it's very time-consuming. The Intel Many Integrated Core (MIC) architecture, which consists of more than 50 cores and supports many parallel programmi...
Article
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In t...
Conference Paper
The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of the coupling is very time-consuming. As a many-core architecture, the Intel's Many Integrated Core (MIC) architecture owns 512-bit vec...
Article
Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulati...
Article
Full-text available
We present a parallel GPU solution of the Caputo fractional reaction-diffusion equation in one spatial dimension with explicit finite difference approximation. The parallel solution, which is implemented with CUDA programming model, consists of three procedures: preprocessing, parallel solver, and postprocessing. The parallel solver involves the pa...
Article
Full-text available
It is time consuming to numerically solve fractional differential equations. The fractional ordinary differential equations may produce Toeplitz-plus-band triangular systems. An efficient iteration method for Toeplitz-plus-band triangular systems is presented with O M l o g M computational complexity and O M memory complexity in this paper, compare...
Article
Full-text available
The computational complexity of one-dimensional time fractional reaction-diffusion equation is í µí±‚(í µí± 2 í µí±€) compared with í µí±‚(í µí±í µí±€) for classical integer reaction-diffusion equation. Parallel computing is used to overcome this challenge. Domain decomposition method (DDM) embodies large potential for parallelization of the nume...
Conference Paper
The Monte Carlo particle transport algorithms are ideally suited to parallel processing architectures and so are good candidates for acceleration using a Graphics Processor Unit (GPU). As the foundation of Monte Carlo N-Particle Transport Code (MCNP), Pseudo Random Number Generator (PRNG) should be provided with some specified nature such as long p...
Conference Paper
Full-text available
Over the last decade, with the increasing performance and programmability of Graphics processing unit (GPU), these units have evolved from specialty hardware to massively parallel general computation devices. Simulation of neutron transport plays an important role in national economical construction and large-scale computing in science and engineer...
Conference Paper
Matrix multiplication is an essential building block of many linear algebra operations and applications. This paper presents parallel algorithms with shared A or B matrix in the memory for the special massively multithreaded Fiteng1000 processor. We discuss the implementations of parallel matrix multiplication algorithms on the multi-core processor...
Article
The method of discontinuous finite element discrete ordinates which involves inverting an operator by iteratively sweeping across a mesh from multiple directions is commonly used to solve the time-dependent particle transport equation. Graphics Processing Unit (GPU) provides great faculty in solving scientific applications. The particle transport w...
Conference Paper
High Performance Computing is focusing on heterogeneous architecture. The Embarrassingly Parallel algorithm is typical of Monte Carlo method which are widely applied to many important scientific areas. In this paper, we present an efficient Hybrid Embarrassingly Parallel algorithm for heterogeneous CPU/GPU clusters and an effective task distributio...
Article
Full-text available
As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI applica...
Article
Pseudo‐random number generators (PRNG) are intensively used in many stochastic algorithms in particle simulations, artificial neural networks and other scientific computation. The PRNG in Monte Carlo N‐Particle Transport Code (MCNP) requires long period, high quality, flexible jump and fast enough. In this paper, we implement such a PRNG for MCNP o...
Conference Paper
Cloud computing emerges as one of the hottest topic in field of information technology. Cloud computing is based on several other computing research areas such as HPC, virtualization, utility computing and grid computing. In order to make clear the essential of cloud computing, we propose the characteristics of this area which make cloud computing...