光滑粒子流体动力学方法的高效异构加速

发布时间：2018-02-25 17:33

本文关键词： CPU-GPU耦合计算热点加速全GPU加速对等协同粒子模拟光滑粒子流体动力学 petaPar　出处：《计算机学报》2017年09期 　论文类型：期刊论文

【摘要】：目前,光滑粒子流体动力学方法的GPU加速几乎都是基于简化的Euler控制方程,完整的Navier-Stokes方程的GPU实现非常少,且对其困难、优化策略、加速效果的描述较为模糊.另一方面,CPU-GPU协同方式深刻影响着异构平台的整体效率,GPU加速模型还有待进一步探讨.文中的目的是将自主开发的、基于Navier-Stokes方程的SPH应用程序petaPar在异构平台上进行高效加速.文中首先从数学公式的角度分析了Euler方程和NavierStokes方程的计算特征,并总结了Navier-Stokes方程在GPU加速中面临的困难.由于Euler方程只含有简单的标量和向量计算,是典型的适合GPU的计算密集轻量级kernel;而完整形式的Navier-Stokes方程涉及复杂的材料本构和大量张量计算,需要面对GPU上大kernel带来的系列问题,如访存压力、cache不足、低占用率、寄存器溢出等.文中通过减少粒子属性、提取操作到粒子更新、利用粒子的重用度、最大化GPU占用率等策略对Navier-Stokes方程的粒子交互kernel进行优化,具体实现见5.1节.同时,文中调研了三种GPU加速模型:热点加速、全GPU加速以及对等协同,分析了其开发投入、应用范围、理论加速比等,并深入探讨了对等协同模型的通信优化策略.由于通信粒子的不连续分布,GPU端通信粒子的抽取、插入、删除等操作本质上是对不连续内存的并行操作,会严重影响CPU-GPU的同步效果,而相关文献对此问题没有阐述.我们通过改进粒子索引规则解决此问题:粒子排序时不仅考虑网格编号,还要考虑网格类型,具体实现见5.2.3节.基于Euler方程和Navier-Stokes方程实现并分析了三种GPU加速模型.测试结果显示,三种模型下,Euler方程分别获得了8倍、33倍、36倍的加速,Navier-Stokes方程分别获得了6倍、15倍、20倍的加速.全GPU加速均突破了热点加速的加速比理论上限,对等协同比之全GPU加速又可以获得进一步提高.特别是对于Navier-Stokes方程,采用文中的kernel优化策略及对等协同模型,最终在异构平台上实现了20倍的整体加速.针对Navier-Stokes方程的对等协同版本这一应用范围最广、加速效果最好的实现,在Titan超级计算机的6个和1024个异构计算节点上进行了强、弱可扩展性测试,分别获得了67.1%和75.2%的并行效率.
[Abstract]:At present, the GPU acceleration of smooth particle hydrodynamics method is almost based on the simplified Euler governing equation. The GPU implementation of the complete Navier-Stokes equation is very few, and it is difficult to optimize the strategy. On the other hand, the CPU-GPU collaborative mode has a profound impact on the overall efficiency of heterogeneous platforms. PetaPar, a SPH application program based on Navier-Stokes equation, accelerates efficiently on heterogeneous platforms. In this paper, the computational characteristics of Euler equation and NavierStokes equation are analyzed from the point of view of mathematical formula. The difficulties of Navier-Stokes equation in GPU acceleration are summarized. Because Euler equation contains only simple scalar and vector computation, The complete form of Navier-Stokes equation involves complex constitutive structure of materials and a large number of Zhang Liang calculations. It is necessary to face a series of problems caused by large kernel on GPU, such as insufficient memory cache and low occupancy rate. Register overflow etc. In this paper, particle interaction kernel of Navier-Stokes equation is optimized by reducing particle properties, extracting operation to particle update, using particle reuse degree and maximizing GPU occupancy. This paper investigates three kinds of GPU acceleration models: hot spot acceleration, full GPU acceleration and peer-to-peer collaboration, analyzes their development input, application scope, theoretical acceleration ratio, etc. The communication optimization strategy of peer-to-peer cooperative model is discussed in detail. Because the discontinuous distribution of communication particles in GPU terminal communication particles extraction, insertion, deletion and other operations are essentially parallel operations on discontinuous memory. We solve this problem by improving particle index rules: particle sorting not only considers grid numbers, but also mesh types. Three kinds of GPU acceleration models are realized and analyzed based on Euler equation and Navier-Stokes equation. The test results show that, In the three models, the acceleration of the Navier-Stokes equation is 6 times 15 times and 20 times higher than that of the Navier Stokes equation, respectively, and the acceleration rate of all GPU accelerations is above the theoretical upper limit of the acceleration ratio of hot spots. The full GPU acceleration of the peer-to-peer collaboration ratio can be further improved, especially for the Navier-Stokes equation, the kernel optimization strategy and the peer-to-peer collaboration model are used in this paper. Finally, 20 times the whole acceleration is realized on the heterogeneous platform. Aiming at the most widely used peer-to-peer collaborative version of Navier-Stokes equation, the best acceleration effect is achieved on 6 and 1024 heterogeneous computing nodes of Titan supercomputer. The parallel efficiency of 67.1% and 75.2% is obtained by weak scalability test.
【作者单位】：中国科学院计算技术研究所;中国科学院软件研究所;中国工程物理研究院高性能数值模拟软件中心;
【基金】：国家自然科学基金(11472274,11072241,11111140020,91130026) 美国橡树岭国家实验室/美国国家计算科学中心“主任基金”(MAT028,CSC153)资助~~
【分类号】：O35

【相似文献】

中国期刊全文数据库前7条

1 郑兴;段文洋;;溃坝模拟的光滑粒子流体动力学方法及其粘性特性(英文)[J];Journal of Marine Science and Application;2010年01期

2 陈刘定;姚磊江;李自山;郑洁;童小燕;徐绯;;光滑质点流体动力学方法中数值断裂的防止[J];机械强度;2010年01期

3 李付鹏;汪继文;;基于光滑粒子方法的水流数值模拟[J];计算机技术与发展;2010年07期

4 陈刘定;童小燕;陈昊;郑翔;程起有;姚磊江;;光滑质点流体动力学方法中断裂准则的引入[J];机械强度;2010年04期

5 韩亚伟;强洪夫;赵玖玲;高巍然;;光滑粒子流体动力学方法固壁处理的一种新型排斥力模型[J];物理学报;2013年04期

6 闫民;冯科珂;;圆柱绕流运动的GHM模拟[J];科技导报;2013年20期

7 吴建松;鲍凯;张辉;杨锐;;基于SPH方法的阶梯流数值模型[J];清华大学学报(自然科学版);2011年06期

中国重要会议论文全文数据库前5条

1 陈建设;徐绯;黄其青;;光滑质点流体动力学方法的稳定性分析[A];庆祝中国力学学会成立50周年暨中国力学学会学术大会’2007论文摘要集（下）[C];2007年

2 熊红兵;朱剑;;光滑粒子流体动力学方法中流体不可压缩性的研究及其应用[A];中国力学学会学术大会'2009论文摘要集[C];2009年

3 蒋亦民;刘佑;;流体动力学方法与本构模型[A];中国力学学会学术大会'2009论文摘要集[C];2009年

4 张学莹;潘中建;;溃坝波与结构物作用过程的SPH并行实现[A];中国力学学会学术大会'2009论文摘要集[C];2009年

5 闫民;尹建业;孙宝平;方俊;;GHM颗粒流体动力学方法[A];第八届全国动力学与控制学术会议论文集[C];2008年

中国博士学位论文全文数据库前1条

1 李付鹏;光滑粒子流体动力学方法及其在浅水波方程中的应用[D];安徽大学;2014年

中国硕士学位论文全文数据库前2条

1 张强发;光滑质点流体动力学方法在结构分析中的应用[D];南京航空航天大学;2007年

2 沈雁鸣;超高速碰撞的三维光滑粒子流体动力学方法模拟[D];中国空气动力研究与发展中心;2008年

，

本文编号：1534506

资料下载

论文发表

支付宝下载

Download by Alipay
微信下载

Download by Wechat
会员下载

Download by Member

本文链接：https://www.wllwen.com/kejilunwen/lxlw/1534506.html

上一篇：基于Levinson三阶剪切理论的功能梯度轴对称圆板特征值问题求解
下一篇：湍流边界层外区超大尺度相干结构相位平均波形

论文发表

·知网|万方|维普|龙源|省级|国家级|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|