If we want to store an array of two strings, such as [ “Water”, “Rising” ] in a contiguous memory buffer, we cannot do it with the previous structure since each entry has a different length. The solution is to add to this layout a second buffer in addition to the data buffer, an offsets buffer.

Using an offsets buffer allows the entirety of the data of the array to be held in a single contiguous memory buffer. The only lookup cost for finding the value of a given index is to look up the indexes in the offsets buffer to find the correct slice of the data. The offsets buffer will always contain length + 1 signed integers (either 32-bit or 64-bit, based on the data type being used) that indicate the starting position of each corresponding slot of the array.
!tip Arrow String Array vs Traditional Vector of Strings Generally, a string is represented as a pointer to a memory location and an integer for the length, so a vector of strings is a vector of these pointers and lengths. For many use cases, this is very efficient since, typically, a single memory address is going to be much smaller than the size of the string data, so passing around this address and length is efficient for referencing individual strings For a large number of strings, however it’s much more efficient to have a single buffer to scan through in memory. As you operate on each string, you can maintain the memory locality we mentioned previously, keeping the memory we need to look at physically close to the next chunk of memory we’re likely going to need. This way, we spend less time jumping around different pages of memory and can spend more CPU cycles performing the computations. It’s also extremely efficient to get a single string as you can simply take a view of the buffer by using the address indicated by the offset to create a string object without copying the data.