Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: A New Overview of the Trilinos Project – Part 1
Article type: Research Article
Authors: Edwards, H. Carter; | Sunderland, Daniel | Porter, Vicki | Amsler, Chris | Mish, Sam
Affiliations: Computing Research Center, Sandia National Laboratories, Livermore, CA, USA | Engineering Sciences Center, Sandia National Laboratories, Albuquerque, NM, USA | Department of Electrical and Computer Engineering, Kansas State University, Manhattan, KS, USA | Department of Mathematics, California State University, Los Angeles, CA, USA
Note: [] Corresponding author: H. Carter Edwards, Computing Research Center, Sandia National Laboratories, Livermore, CA, USA. E-mail: hcedwar@sandia.gov.
Abstract: Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern manycore accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Kokkos Array programming model provides library-based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) manycore compute devices each with its own memory space, (2) data parallel kernels and (3) multidimensional arrays. Kernel execution performance is, especially for NVIDIA® devices, extremely dependent on data access patterns. Optimal data access pattern can be different for different manycore devices – potentially leading to different implementations of computational kernels specialized for different devices. The Kokkos Array programming model supports performance-portable kernels by (1) separating data access patterns from computational kernels through a multidimensional array API and (2) introduce device-specific data access mappings when a kernel is compiled. An implementation of Kokkos Array is available through Trilinos [Trilinos website, http://trilinos.sandia.gov/, August 2011].
Keywords: Multicore, manycore, GPGPU, data-parallel, thread-parallel
DOI: 10.3233/SPR-2012-0343
Journal: Scientific Programming, vol. 20, no. 2, pp. 89-114, 2012
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl