-
[show abstract]
[hide abstract]
ABSTRACT: Much progress has been made in the design of efficient acquisition trajectories for high spatial and temporal resolution in magnetic resonance imaging (MRI). Additionally, significant developments in image reconstruction have enabled the reconstruction of reasonable images from massively undersampled or noisy data that is corrupted by a variety of physical effects, including magnetic field inhomogeneity. Translation of these techniques into clinical imaging has been impeded by the need for expertise and computational facilities to realize the potential of these methods. We present the Illinois Massively Parallel Acceleration Toolkit for Image reconstruction with ENhanced Throughput in MRI (IMPATIENT MRI), a reconstruction utility that enables advanced techniques within clinically relevant computation times by using the computational power available in low-cost graphics processing cards.
Biomedical Imaging: From Nano to Macro, 2011 IEEE International Symposium on; 05/2011
-
[show abstract]
[hide abstract]
ABSTRACT: Regularization is a common technique used to improve image quality in inverse problems such as MR image reconstruction. In this work, we extend our previous Graphics Processing Unit (GPU) implementation of MR image reconstruction with compensation for susceptibility-induced field inhomogeneity effects by incorporating an additional quadratic regularization term. Regularization techniques commonly impose the prior information that MR images are relatively smooth by penalizing large changes in intensity between neighboring voxels. However, the associated computations often increase data access and the overall computational load, which can lead to slower image reconstruction. This motivates us to adopt a GPU-enabled implementation of spatial regularization using sparse matrices. This implementation enables the computations for the entire reconstruction procedure to be done on the GPU, which avoids the memory bandwidth bottlenecks associated with frequent communications between the GPU and CPU. Both the CPU and GPU code of this implementation will be available for release at the time of the conference.
Biomedical Engineering and Informatics (BMEI), 2010 3rd International Conference on; 11/2010
-
[show abstract]
[hide abstract]
ABSTRACT: There are two avenues for many-core machines to gain higher performance: increasing the number of processors, and increasing the number of vector units in one SIMD processor. A truly scalable algorithm should take advantage of both. However, most past research on scalable memory allocators scales well with the number of processors, but poorly with the number of vector units in one SIMD processor. As a result, they are not truly scalable on many-core architectures. In this work, we introduce our proposed solution through the design of XMalloc, a truly scalable, efficient lock-free memory allocator. We will present (1) our solution for transforming traditional atomic compare-and-swap based lock-free algorithm to scale on SIMD architectures, and (2) a hierarchical cachelike buffer solution to reduce the average latency of accesses to non-scalable or slow resources such as main memory in a many-core machine. We implemented XMalloc as a memory allocator on an NVIDIA Tesla C1060 GPU with 240 processing units. Our experimental results show that XMalloc scales very well with growth in both the number of processors and the number of vector units in each SIMD processor. Our truly scalable lock-free solution achieves 211 times speedup compared to the common lock-free solution.
Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on; 08/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In this paper, we propose a method for exploiting more parallelism by isolating the reduction from users of the intermediate results. The other main contribution of our work is enabling the parallelization of more complex reduction codes, including those that involve the use of intermediate reduction results. The proposed transformations are often implemented by programmers in an ad-hoc manner, but to the best of our knowledge no previous work has been proposed to automate these transformations for many-core architectures. We show that the automatic transformations can result in significant speedup compared to the original code using two benchmark applications.
Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on; 08/2010
-
[show abstract]
[hide abstract]
ABSTRACT: We propose a fast implementation for iterative MR image reconstruction using Graphics Processing Units (GPU). In MRI, iterative reconstruction with conjugate gradient algorithms allows for accurate modeling the physics of the imaging system. Specifically, methods have been reported to compensate for the magnetic field inhomogeneity induced by the susceptibility differences near the air/tissue interface in human brain (such as orbitofrontal cortex). Our group has previously presented an algorithm for field inhomogeneity compensation using magnetic field map and its gradients. However, classical iterative reconstruction algorithms are computationally costly, and thus significantly increase the computation time. To remedy this problem, one can utilize the fact that these iterative MR image reconstruction algorithms are highly parallelizable. Therefore, parallel computational hardware, such as GPU, can dramatically improve their performance. In this work, we present an implementation of our field inhomogeneity compensation technique using NVIDA CUDA(Compute Unified Device Architecture)-enabled GPU. We show that the proposed implementation significantly reduces the computation times around two orders of magnitude (compared with non-GPU implementation) while accurately compensating for field inhomogeneity.
Biomedical Imaging: From Nano to Macro, 2010 IEEE International Symposium on; 05/2010
-
[show abstract]
[hide abstract]
ABSTRACT: Large-scale GPU clusters are gaining popularity in the scientific computing community. However, their deployment and production use are associated with a number of new challenges. In this paper, we present our efforts to address some of the challenges with building and running GPU clusters in HPC environments. We touch upon such issues as balanced cluster architecture, resource sharing in a cluster environment, programming models, and applications for GPU clusters.
Cluster Computing and Workshops, 2009. CLUSTER '09. IEEE International Conference on; 10/2009