Compute Unified Device Architecture is a parallel computing platform and programming model created by NVIDIA. It is designed to leverage the processing power of NVIDIA GPUs for general-purpose computing tasks.
CUDA provides:
-
A programming model: This includes extensions to common programming languages like C, C++, and Fortran that allow you to write code that specifies how to utilize GPU threads and memory. For example, CUDA keywords like
__global__or__device__help define functions to run on the GPU. -
A runtime and driver API: The CUDA runtime API simplifies GPU programming by abstracting low-level hardware details, while the driver API allows fine-grained control for advanced use cases.
-
An SDK (Software Development Kit): This SDK contains development tools, compilers, and debugging utilities, along with a collection of highly optimized libraries for specific domain such ass
- Linear algebra (
cuBLAS) - Signal processing (
cuFFT) - Deep learning (
cuDNN).
- Linear algebra (
-
A set of instructions tailored for NVIDIA GPUs: CUDA applications execute GPU kernels (small programs) that run in parallel across thousands of threads. CUDA compilers (like
nvcc) generate PTX (Parallel Thread Execution), an intermediate representation, which is then translated into machine code specific to the NVIDIA GPU architecture (e.g., Volta, Ampere).