About
82
Publications
18,097
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,372
Citations
Introduction
Additional affiliations
January 2012 - July 2015
Publications
Publications (82)
The fractional reaction-diffusion equations play an important role in dynamical systems. Indeed, it is time consuming to numerically solve differential fractional diffusion equations. In this paper, we present a parallel algorithm for the Riesz space fractional diffusion equation. The parallel algorithm, which is implemented with MPI parallel progr...
The computational complexity of Caputo fractional reaction–diffusion equation is \(O(MN^2)\) compared with \(O(MN)\) of traditional reaction–diffusion equation, where \(M\), \(N\) are the number of time steps and grid points. A efficient parallel solution for Caputo fractional reaction–diffusion equation with explicit difference method is proposed....
Graphics Processing Unit (GPU), originally developed for real-time, high-definition D graphics in computer games, now provides great faculty in solving scientific applications. The basis of particle transport simulation is the time-dependent, multi-group, inhomogeneous Boltzmann transport equation. The numerical solution to the Boltzmann equation i...
We present a survey of fractional differential equations and in particular of the computational cost for their numerical solutions from the view of computer science. The computational complexities of time fractional, space fractional, and space-time fractional equations are O(N2M), O(NM2), and O(NM(M + N)) compared with O(MN) for the classical part...
Parallel computing is a useful technology for scientific and engineering algorithms/applications. LU-SGS (lower-upper Symmetric-Gauss-Seidel method) is an efficient and robust scheme for CFD (Computational fluid dynamics) and has strong data dependence in its computation. In this paper, we present an efficient wavefront parallel algorithm for 3D (t...
Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields. As modern reduced instruction set computing architectures like the fifth generation (RISC-V) regard multiple registers as a vector register group for wide instruction parallelism, optimizing merge sort with this vectorized property is becoming increasingl...
Password-based recovery is a widely used method for regaining access to applications or services when passwords are lost or forgotten. It is commonly used in electronic forensics by law enforcement agencies, information acquisition in the commercial sector, and data recovery for individuals. However, as encryption algorithms and complex passwords b...
Mesh smoothing methods can enhance mesh quality by eliminating distorted elements, leading to improved convergence in simulations. To balance the efficiency and robustness of traditional mesh smoothing process, previous approaches have employed supervised learning and reinforcement learning to train intelligent smoothing models. However, these meth...
Magnetotelluric (MT) sounding is a geophysical technique widely utilized in mineral resource surveys, where conductivity and magnetic permeability serve as essential physical parameters for forward modeling and inversion. However, the effects of conductive anisotropy and non-zero magnetic susceptibility are usually ignored. In this study, we presen...
The marine magnetotelluric (MMT) method is a significant tool extensively utilized in offshore studies, including the understanding of the Earth’s tectonics and hydrocarbon exploration. Conductive anisotropy and non-zero magnetic susceptibility are common phenomena observed in the Earth’s subsurface, and MMT forward modeling is the basis of practic...
Merge sort as a divide-sort-merge paradigm has been widely applied in computer science fields. As modern reduced instruction set computing architectures like the fifth generation (RISC-V) regard multiple registers as a vector register group for wide instruction parallelism, optimizing merge sort with this vectorized property is becoming increasingl...
Sorting algorithms are the most extensively researched topics in computer science and serve for numerous practical applications. Although various sorts have been proposed for efficiency, different architectures offer distinct flavors to the implementation of parallel sorting. In this paper, we propose a hybrid vectorized merge sort on ARM NEON, nam...
This paper presents MST, a communication-efficient message library for fast graph traversal on exascale clusters. The key idea is to follow the multi-level network topology to perform topology-aware message aggregation, where small messages are gathered and scattered at each level of domain. To facilitate message aggregation, we equip MST with flex...
With the improvement of security awareness, in order to guarantee information security, more advanced and secure encryption algorithms are applied to Microsoft Office. People also set more complex encryption passwords. However, once the initial password is forgotten, the encrypted information needs to be retrieved. The conventional brute force crac...
Computational fluid dynamics simulation accounts for a large number of workloads in the numerical design optimization of aerodynamics problems. In this paper, we develop AFFNet, an advanced neural network and physics solver coupled framework for accelerating flow field simulations. AFFNet combines the benefits of an attention mechanism, affine tran...
In this paper, we present a novel surface mesh generation approach that splits B-rep geometry models into isotropic triangular meshes based on neural networks and splitting lines. In the first stage, a recursive method is designed to generate plentiful data to train the neural network model offline. In the second stage, the implemented mesh generat...
Computational fluid dynamics (CFD) plays a critical role in many scientific and engineering applications, with aerodynamic design optimization being a primary area of interest. Recently, there has been much interest in using artificial intelligence approaches to accelerate this process. One promising method is the graph convolutional neural network...
Mesh generation remains a key technology in many areas where numerical simulations are required. As numerical algorithms become more efficient and computers become more powerful, the percentage of time devoted to mesh generation becomes higher. In this paper, we present an improved structured mesh generation method. The method formulates the meshin...
Evaluating mesh quality prior to performing the computational fluid dynamics (CFD) simulation is an essential step to ensure the acceptable accuracy of cylinder modelling. However, traditional mesh quality indicators are often insufficient since they only check geometric information on individual distorted elements. To yield more accurate results,...
As a theoretically rigorous and accurate method, FEP-ABFE (Free Energy Perturbation-Absolute Binding Free Energy) calculations showed great potential in drug discovery, but its practical application was difficult due to high computational cost. To rapidly discover antiviral drugs targeting SARS-CoV-2 M pro and TMPRSS2, we performed FEP-ABFE–based v...
The quality of the finite element mesh has a considerable effect on the efficiency and accuracy of computational fluid dynamics (CFD) simulations. To ensure the generated mesh is of good quality, many quality metrics have been proposed to assess the generated mesh, such as aspect ratio, skewness, Jacobian ratio, etc. Such metrics, however, are prim...
Mesh generation accounts for a large number of workloads in the numerical analysis. In this paper, we introduce a novel differential method MGNet for structured mesh generation. The proposed method poses the meshing task as an optimization problem. It takes boundary curves as input, employs a well-designed neural network to study the potential mesh...
Evaluating mesh quality before solving is crucially important for error control in the numerical simulation of airfoils. Traditional mesh quality metrics are used to identify distorted mesh elements by analyzing their geometric shape information like angles and edges. However, these metrics fail to recognize numerical errors stemming from quality a...
SARS-coronavirus-2 (SARS-CoV2) Omicron variant (B.1.1.529) is of great concern to the world due to multiple mutations that may have an impact on transmissibility and immune evasion. Compared to the wild type (WT), there are 15 mutations in the Omicron receptor-binding domain (RBD), 10 of which are in the receptor-binding motif (RBM), where the host...
Deep neural networks (DNNs) have recently shown great potential in solving partial differential equations (PDEs). The success of neural network-based surrogate models is attributed to their ability to learn a rich set of solution-related features. However, learning DNNs usually involves tedious training iterations to converge and requires a very la...
Although in the past few decades, many methods such as heatmap, 3D morphable model (3DMM), and generative adversarial network (GAN), have been used to assist facial landmarks extraction, there is a lack of research on balancing the models’ size and accuracy. Therefore, this paper proposes a landmark detection model based on the ShufflenetV2 module...
This paper develops a multi-physics interface code MC-FLUENT to couple the Monte Carlo code OpenMC with the commercial computational fluid dynamics code ANSYS FLUENT. The implementations and parallel performances of block Gauss–Seidel-type and block Jacobi-type Picard iterative algorithms have been investigated. In addition, this paper introduces t...
An important objective of quality control in CFD pre-processing is the facility to indicate to the engineer the validity of the generated mesh. Existing quality measures mainly focus on the subjective evaluation of the shape information of mesh elements, such as aspect ratio, skewness, and shape regularity, and often ignore mesh distribution detail...
3-D magnetotelluric (MT) forward modeling has always been faced with the problems of high memory requirements and long computing time. In this article, we design a scalable parallel algorithm for 3-D MT finite element modeling in anisotropic media. The parallel algorithm is based on the distributed mesh storage, including multiple parallel granular...
Three-dimensional magnetotelluric modeling algorithm of high accuracy and high efficiency is required for data interpretation and inversion. In this paper, edge-based finite element method with unstructured mesh is used to solve 3D magnetotelluric problem. Two boundary conditions—Dirichlet boundary condition and Neumann boundary condition—are set f...
Sparse matrix–vector multiplication (SpMV) is one of the most indispensable kernels of solving problems in numerous applications, but its performance of SpMV is limited by the need for frequent memory access. Modern processors exploit data-level parallelism to improve the performance using single-instruction multiple data (SIMD). In order to take f...
One of the difficult requirements imposed on high-quality CFD mesh generation has been the ability to evaluate the mesh quality efficiently. Due to the lack of a general and effective evaluating criterion, the current mesh quality evaluation task mainly relies on various quality metrics for the shape of mesh elements, such as angle, radius, edge an...
User-Item (U-I) matrix has been used as the dominant data infrastructure of Collaborative Filtering (CF). To reduce space consumption in runtime and storage, caused by data sparsity and growing need to accommodate side information in CF design, one need to go beyond the U-I matrix. In this paper, we took a case study of Succinct Representations in...
One performance-intensive part of automatic speech recognition is the weighted finite-state transducer (WFST) decoding. To solve the problem, we expand parallel Graphics Processing Units (GPU) computing to the decoding period. We describe extension work based on Kaldi toolkit for speech recognition research. Our work can support weighted finite-sta...
The mesh deformation method based on radial basis functions (RBF) has many advantages and is widely used. RBF based mesh deformation method mainly has two steps: data reduction and displacement interpolation. The data reduction step includes solving interpolation weight coefficients and searching for the node with the maximum interpolation error. T...
Sparse matrix‐vector multiplication (SpMV) is an essential kernel in sparse linear algebra and has been studied extensively on all modern processor and accelerator architectures. Compressed Sparse Row (CSR) is a frequently used format for sparse matrices storage. However, CSR‐based SpMV has poor performance on processors with vector units. In order...
HPL is a Linpack benchmark package widely used in high-performance computing tests. Customizing the HPL is crucial for a heterogeneous system equipped with CPU and the China accelerator because of the complexity of the China accelerator and the specified interface on matrix multiplication built in the China accelerator. Therefore, it is advisable t...
In the numerical approximation of fractional order derivatives, the crucial point is to balance the computing complexity and the computing accuracy. We proposed a piecewise memory principle for fractional derivatives, in which the past history is divided into several segments instead of discarded. The piecewise approximation is performed on each se...
Sweep scheduling methods used in particle transport problems belong to the class of precedence-constrained scheduling problems that are NP-complete. It is difficult to schedule local tasks for this type of transport problem and simultaneously optimize computational performance and parallel processor communication. In this paper, we present a parall...
Moving mesh is widely used in the simulation of aerodynamic shape optimization, multibody relative motion, aircraft icing and aeroelasticity. The efficient and high quality mesh deformation is the key technology of moving mesh. This paper presented a new Mesh Deformation method based on Cartesian Background Mesh (MDCBM). First, the Cartesian backgr...
The computational complexity of the numerical simulation of fractional chaotic system and its synchronization control is compared with O(N) for integer chaotic system, where N is step number and O is the computational complexity. In this paper, we propose optimizing methods to solve fractional chaotic systems, including equal-weight memory principl...
An efficient parallel algorithm for Caputo fractional reaction-diffusion equation with implicit finite-difference method is proposed in this paper. The parallel algorithm consists of a parallel solver for linear tridiagonal equations and parallel vector arithmetic operations. For the parallel solver, in order to solve the linear tridiagonal equatio...
The mesh deformation based on radial basis functions (RBFs) have many advantages, thus it has been widely employed in aerodynamic optimization design as well as other fields. For large-scale meshes or complex configurations, the expense of deforming by RBFs is unbearable. Reducing the number of support points that build the RBFs model provides an a...
Monte Carlo (MC) simulation plays an important part in dose calculation for radiotherapy treatment planning. Since the accuracy of MC simulation relies on the number of simulated particles histories, it's very time-consuming. The Intel Many Integrated Core (MIC) architecture, which consists of more than 50 cores and supports many parallel programmi...
The key to large-scale parallel solutions of deterministic particle transport problem is single-node computation performance. Hence, single-node computation is often parallelized on multi-core or many-core computer architectures. However, the number of on-chip cores grows quickly with the scale-down of feature size in semiconductor technology. In t...
The coupling of microwaves into apertures plays an important part in many electromagnetic physics and engineering fields. When the width of apertures is very small, Finite Difference Time Domain (FDTD) simulation of the coupling is very time-consuming. As a many-core architecture, the Intel's Many Integrated Core (MIC) architecture owns 512-bit vec...
Single-node computation speed is essential in large-scale parallel solutions of particle transport problems. The Intel Many Integrated Core (MIC) architecture supports more than 200 hardware threads as well as 512-bit double precision float-point vector operations. In this paper, we use the native model of MIC in the parallelization of the simulati...
We present a parallel GPU solution of the Caputo fractional reaction-diffusion equation in one spatial dimension with explicit finite difference approximation. The parallel solution, which is implemented with CUDA programming model, consists of three procedures: preprocessing, parallel solver, and postprocessing. The parallel solver involves the pa...
It is time consuming to numerically solve fractional differential equations. The fractional ordinary differential equations may produce Toeplitz-plus-band triangular systems. An efficient iteration method for Toeplitz-plus-band triangular systems is presented with O M l o g M computational complexity and O M memory complexity in this paper, compare...
The computational complexity of one-dimensional time fractional reaction-diffusion equation is í µí±(í µí± 2 í µí±) compared with í µí±(í µí±í µí±) for classical integer reaction-diffusion equation. Parallel computing is used to overcome this challenge. Domain decomposition method (DDM) embodies large potential for parallelization of the nume...
The Monte Carlo particle transport algorithms are ideally suited to parallel processing architectures and so are good candidates for acceleration using a Graphics Processor Unit (GPU). As the foundation of Monte Carlo N-Particle Transport Code (MCNP), Pseudo Random Number Generator (PRNG) should be provided with some specified nature such as long p...
bo Yang Kai Lu Jie Liu- [...]
Chunye Gong
Over the last decade, with the increasing performance and programmability of Graphics processing unit (GPU), these units have evolved from specialty hardware to massively parallel general computation devices. Simulation of neutron transport plays an important role in national economical construction and large-scale computing in science and engineer...
Matrix multiplication is an essential building block of many linear algebra operations and applications. This paper presents parallel algorithms with shared A or B matrix in the memory for the special massively multithreaded Fiteng1000 processor. We discuss the implementations of parallel matrix multiplication algorithms on the multi-core processor...
The method of discontinuous finite element discrete ordinates which involves inverting an operator by iteratively sweeping across a mesh from multiple directions is commonly used to solve the time-dependent particle transport equation. Graphics Processing Unit (GPU) provides great faculty in solving scientific applications. The particle transport w...
bo Yang Kai Lu Jie Liu- [...]
Chunye Gong
High Performance Computing is focusing on heterogeneous architecture. The Embarrassingly Parallel algorithm is typical of Monte Carlo method which are widely applied to many important scientific areas. In this paper, we present an efficient Hybrid Embarrassingly Parallel algorithm for heterogeneous CPU/GPU clusters and an effective task distributio...
As a powerful and flexible processor, the Graphic Processing Unit (GPU) can offer a great faculty in solving many high-performance computing applications. Sweep3D, which simulates a single group time-independent discrete ordinates (Sn) neutron transport deterministically on 3D Cartesian geometry space, represents the key part of a real ASCI applica...
Pseudo‐random number generators (PRNG) are intensively used in many stochastic algorithms in particle simulations, artificial neural networks and other scientific computation. The PRNG in Monte Carlo N‐Particle Transport Code (MCNP) requires long period, high quality, flexible jump and fast enough. In this paper, we implement such a PRNG for MCNP o...
Cloud computing emerges as one of the hottest topic in field of information technology. Cloud computing is based on several other computing research areas such as HPC, virtualization, utility computing and grid computing. In order to make clear the essential of cloud computing, we propose the characteristics of this area which make cloud computing...