Synchronization in GPU programming

In GPU programming, synchronization ensures that operations on different devices (or between a host and a device) occur in the correct order, avoiding race conditions and ensuring data consistency. GPU devices, like those using CUDA or HIP, execute operations asynchronously by default, meaning tasks are dispatched to the GPU without blocking the CPU or subsequent commands. However, when interdependencies arise, event synchronization becomes crucial.

CUDA and HIP Streams

A stream is a sequence of commands (kernels, memory copies, etc.) that are issued to the GPU. Commands within a stream execute in the order they are issued, but commands in different streams can execute out of order or concurrently.

CUDA Streams: Streams in NVIDIA’s CUDA framework are used to manage tasks running on GPUs.
HIP Streams: A similar abstraction in AMD’s ROCm platform, analogous to CUDA streams.

Events in GPU Synchronization

An event is a marker that can be recorded at a specific point in a stream. Events are used to track execution progress or synchronize streams. Events are lightweight and offer finer control over synchronization compared to traditional host-based synchronization methods

Host-based synchronization involves the CPU actively managing or waiting for GPU tasks to complete, issuing a blocking call that waits for all GPU tasks to finish, such as with cudaDeviceSynchronize(). This form of busy-waiting involves frequent host-device communication, can introduce latency and overhead

GPU event-based synchronization allows synchronization to occur entirely on the GPU, without involving the CPU. When an event is recorded on a GPU stream using cudaEventRecord or hipEventRecord, the event is tied to a specific point in that stream.

For example:

cudaEventRecord(event, stream1);

Here, stream1 “publishes” the event, signaling when all previous operations in stream1 have completed.
If another stream (stream2) is made to wait for this event, the GPU ensures that no tasks in stream2 execute until the event is completed in stream1.

Tip

cudaEvent_t (CUDA) and hipEvent_t (HIP) are C structs types to represent event objects in the Cuda and HIP

CUDA: `cudaStreamWaitEvent`

The cudaStreamWaitEvent function allows a stream to wait until a specific event has been recorded in another stream.

cudaEvent_t event;
cudaStream_t stream1, stream2;
 
cudaStreamCreate(&stream1);
cudaStreamCreate(&stream2);
cudaEventCreate(&event);
#### Example
 
// Kernel execution in stream1
myKernel<<<blocks, threads, 0, stream1>>>(...);
 
// Record an event in stream1
cudaEventRecord(event, stream1);
 
// Make stream2 wait for the event
cudaStreamWaitEvent(stream2, event, 0);
 
// Execute a kernel in stream2 after the event
myOtherKernel<<<blocks, threads, 0, stream2>>>(...);
 
// Clean up
cudaStreamDestroy(stream1);
cudaStreamDestroy(stream2);
cudaEventDestroy(event);

HIP: `hipStreamWaitEvent`

The hipStreamWaitEvent function mirrors CUDA’s functionality, enabling synchronization between streams in AMD’s HIP framework.

hipEvent_t event;
hipStream_t stream1, stream2;
 
hipStreamCreate(&stream1);
hipStreamCreate(&stream2);
hipEventCreate(&event);
 
// Kernel execution in stream1
hipLaunchKernelGGL(myKernel, blocks, threads, 0, stream1, ...);
 
// Record an event in stream1
hipEventRecord(event, stream1);
 
// Make stream2 wait for the event
hipStreamWaitEvent(stream2, event, 0);
 
// Execute a kernel in stream2 after the event
hipLaunchKernelGGL(myOtherKernel, blocks, threads, 0, stream2, ...);
 
// Clean up
hipStreamDestroy(stream1);
hipStreamDestroy(stream2);
hipEventDestroy(event);

Edmondo's Vault

Explorer

Synchronization in GPU programming

CUDA and HIP Streams

Events in GPU Synchronization

CUDA: `cudaStreamWaitEvent`

HIP: `hipStreamWaitEvent`

Graph View

Table of Contents

Backlinks

Edmondo's Vault

Explorer

Synchronization in GPU programming

CUDA and HIP Streams

Events in GPU Synchronization

CUDA: cudaStreamWaitEvent

HIP: hipStreamWaitEvent

Graph View

Table of Contents

Backlinks

CUDA: `cudaStreamWaitEvent`

HIP: `hipStreamWaitEvent`