Arrow on the GPU

Arrow libraries provide ways to customize how memory is allocated and utilized within the Arrow libraries themselves. Generally, these are referred to as memory pools (C++ and Python) or an allocator interface (Go). Of course, these default pools and allocators are same and allocate CPU-accessible main memory.

To represent different devices, Arrow has an aptly named Device abstraction. Each Device has a default MemoryManager object that specifies how to allocate memory on that device. There can also be additional memory manager objects tied to a given device, possibly wrapping custom memory pools.

Device-agnostic buffer handling

The Buffer object has a few methods to facilitate coding in a device-agnostic manner:

arrow::Buffer::View: Given a buffer and a destination memory manager, a device-specific mechanism is used to construct a memory address for the destination device to give access to the contents of the buffer. If it is not possible to do this without performing a copy, it returns a bad status. If the source and destination devices are the same, this is a no-op!
arrow::Buffer::Copy: Copy the contents of a buffer to the destination device and return a new buffer allocated by the provided memory manager.
arrow::Buffer::CopyNonOwned: This is an alternative to Copy that is useful when the source buffer is externally managed and its lifetime is not tied to the Buffer object itself. Otherwise, Copy should just be used.
arrow::Buffer::ViewOrCopy: First attempt to create a zero-copy view on the destination device, otherwise copy the contents to a new buffer.

Another common operation when working with buffers is also to feed them into I/O functions, which may read or write directly from or into the buffer. If you want to perform I/O operations without having to assume a CPU-accessible buffer, you can utilize the provided arrow::Buffer::GetReader and arrow::Buffer::GetWriter functions to get device-specific stream interfaces to the buffer’s contents.

ArrowDeviceArray and ArrowDeviceArrayStream struct

These two structs belongs to the Arrow C data interface and can be used to pass data between different libraries, languages and runtime without host-to-device, device-to-host copy.

As described in GPU acceleration, the traditional process involves copying the data to pass it between Numba and libcudf, which is highly inefficient. Using ArrowDeviceArray instead, we can pass device pointers around between libraries without copying the data.

Compared to a ‘traditional’ ArrowArray, the device array:

Has a ArrowDeviceType which can be either CPU or CUDA
Has a device_id in case multiple devices are available
Should use private_data and release callback for releasing device memory
Provides a sync event callback that we should call on the CPU before accessing the memory buffers, to avoid race conditions (i.e. the GPU operate asynchronously)
Provides a reserved 24 bytes at the end of the object for future expansion

list
uuid:"3ddf5c9b-f9e4-40a9-8f3a-ff74800e9e13"

Edmondo's Vault

Explorer

Arrow on the GPU

Device-agnostic buffer handling

ArrowDeviceArray and ArrowDeviceArrayStream struct

Graph View

Table of Contents

Backlinks