OpenACC is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems. As in OpenMP, the programmer can annotate C, C++ and Fortransource code to identify the areas that should be accelerated using compiler directives and additional functions. Like OpenMP 4.0 and newer, OpenACC can target both the CPU and GPU architectures and launch computational code on them. OpenACC members have worked as members of the OpenMP standard group to merge into OpenMP specification to create a common specification which extends OpenMP to support accelerators in a future release of OpenMP. These efforts resulted in a technical report for comment and discussion timed to include the annual Supercomputing Conference and to address non-Nvidia accelerator support with input from hardware vendors who participate in OpenMP. At ISC’12 OpenACC was demonstrated to work on Nvidia, AMD and Intel accelerators, without performance data. In November 12, 2012, at the SC12 conference, a draft of the OpenACC version 2.0 specification was presented. New suggested capabilities include new controls over data movement, and support for explicit function calls and separate compilation. OpenACC 2.0 was officially released in June 2013. Version 2.5 of the specification was released on October 2015, while version 2.6 was released on November 2017. The latest version of specification, version 2.7, was released on November 2018. On April 3, 2019 John Levesque announced that Cray are ending support for OpenACC in CCE/9.0.
Compiler support
Support of OpenACC is available in commercial compilers from PGI, and Cray. OpenUH is an Open64 based open source OpenACC compiler supporting C and FORTRAN, developed by HPCTools group from University of Houston. OpenARC is an open source C compiler developed at Oak Ridge National Laboratory to support all features in the OpenACC 1.0 specification. An experimental open source compiler, accULL, is developed by the University of La Laguna. Omni Compiler is an open source compiler developed at HPCS Laboratory. of University of Tsukuba and Programming Environment Research Team of RIKEN Center for Computational Science, Japan, supported OpenACC, and combining XcalableMP and OpenACC. IPMACC is an open source C compiler developed by University of Victoria that translates OpenACC to CUDA, OpenCL, and ISPC. Currently, only following directives are supported: data, kernels, loop, and cache. GCC support for OpenACC was slow in coming. A GPU-targeting implementation from Samsung was announced in September 2013; this translated OpenACC 1.1-annotated code to OpenCL. The announcement of a "real" implementation followed two months later, this time from NVIDIA and based on OpenACC 2.0. This sparked some controversy, as the implementation would only target NVIDIA's own PTX assembly language, for which no open source assembler or runtime was available. Experimental support for OpenACC/PTX did end up in GCC as of version 5.1. GCC6 and GCC7 release series include a much improved implementation of the OpenACC 2.0a specification. GCC 9.1 offers nearly complete OpenACC 2.5 support.
Usage
In a way similar to OpenMP 3.x on homogeneous system or the earlier OpenHMPP, the primary mode of programming in OpenACC is directives. The specifications also include a runtime library defining several support functions. To exploit them, user should include "openacc.h" in C or "openacc_lib.h" in Fortran; and then call acc_init function.
Directives
OpenACC defines an extensive list of pragmas, for example:
There are some runtime API functions defined too: acc_get_num_devices, acc_set_device_type, acc_get_device_type, acc_set_device_num, acc_get_device_num, acc_async_test, acc_async_test_all, acc_async_wait, acc_async_wait_all, acc_init, acc_shutdown, acc_on_device, acc_malloc, acc_free. OpenACC generally takes care of work organisation for the target device however this can be overridden through the use of gangs and workers. A gang consists of workers and operates over a number of processing elements.