This report documents our master thesis project, which is about parallel programming with CUDA, the NVIDIA GPU architecture with support for general purpose computing. The purpose of the thesis is to uncover the qualities of CUDA as a parallel computing platform, determining the possibilities and limitations of its ability to handle different types of algorithms. We examine this by performing a case study of two algorithms used in the computationally intensive field of n-body simulations. In our report we present the topics of our thesis through chapters containing overviews of the relevant theory. Based on this we investigate how CUDA performs using the embarrassingly parallel n-body all-pairs algorithm, as well as the Barnes-Hut algorithm, which is partially irregular with regards to parallelization due to its datastructure. We have found that CUDA performs exceptionally well on n-body all-pairs, observing up to a 100x speed-up on an optimized GPU implementation compared to an implementation running on a computer with 16 CPU cores. The CUDA implementation of the Barnes-Hut algorithm also shows increased performance, as the most costly part of the algorithm is parallelizable. We find that although it is possible to implement an irregular algorithm in CUDA, doing so with success requires an understanding of CUDA programming and the CUDA model of parallelism. We conclude that CUDA performs well on massively parallel problems and can be useful for irregular problems as well. Programming for it can be complex when optimizing or when the algorithm is not easily parallelized. The platform has a good performance and potential to accelerate suitable applications.
|Uddannelser||Datalogi, (Bachelor/kandidatuddannelse) Kandidat|
|Udgivelsesdato||22 mar. 2012|
- barnes hut