Compute Unified Device Architecture is a parallel computing platform and programming model created by NVIDIA. It is designed to leverage the processing power of NVIDIA GPUs for general-purpose computing tasks.

CUDA provides:

  1. A programming model: This includes extensions to common programming languages like C, C++, and Fortran that allow you to write code that specifies how to utilize GPU threads and memory. For example, CUDA keywords like __global__ or __device__ help define functions to run on the GPU.

  2. A runtime and driver API: The CUDA runtime API simplifies GPU programming by abstracting low-level hardware details, while the driver API allows fine-grained control for advanced use cases.

  3. An SDK (Software Development Kit): This SDK contains development tools, compilers, and debugging utilities, along with a collection of highly optimized libraries for specific domain such ass

    1. Linear algebra (cuBLAS)
    2. Signal processing (cuFFT)
    3. Deep learning (cuDNN).
  4. A set of instructions tailored for NVIDIA GPUs: CUDA applications execute GPU kernels (small programs) that run in parallel across thousands of threads. CUDA compilers (like nvcc) generate PTX (Parallel Thread Execution), an intermediate representation, which is then translated into machine code specific to the NVIDIA GPU architecture (e.g., Volta, Ampere).