Geohashing

Geohashing encodes latitude and longitude into a single alphanumeric string, creating a grid-like division of the Earth’s surface. Locations within the same grid share common prefixes in their geohash codes.

It is simple because it uses a 1D string, simplifying storage and indexing, and shared prefixes indicates spatial proximity, facilitating grid search.

However, adjacent locations near grid edges might have dissimilar geohashes, complicating proximity search.

Info

Another big limitation is the fixed size, which doesn’t adapt well to sparse and dense areas

Quadtrees

A quadtree recursively subdivides a 2D space into four quadrants, creating a hierarchical tree structure that adapts to data distribution. You can imagine it can be built with the following code:

public void buildQuadtree(TreeNode node) {
    if (countNumberOfBusinessesInCurrentGrid(node) > 100) {
        node.subdivide();
        for (TreeNode child : node.getChildren()) {
            buildQuadtree(child);
        }
    }
}

In a leaf node, we store:

  • Top-left, bottom-right coordinates to identify the quadrant dimensions
  • List of business IDs in the grid In an internalt node we store:
  • Top-left, bottom-right coordinates of quadrant dimensions
  • 4 pointers to children

Advantages:

  • Adaptive Partitioning: Dynamically subdivides space based on data density, allowing efficient handling of both sparse and dense regions.
  • Efficient Range Queries: Facilitates rapid retrieval of all points within a specified area.

R-Trees

R-Trees are a tree-based data structure optimized for spatial indexing, particularly for complex range queries or nearest-neighbor searches. Unlike quadtrees, R-Trees group nearby objects using minimum bounding rectangles (MBRs).

As in quadtrees, only leaf nodes store data points or objects. All node contains:

  • Bounding Rectangle: Encapsulates all rectangles or points within the node.
  • Children: Links to child nodes or data points. Insertion works by finding the leaf node whose bounding rectangle requires the least enlargement, and splitting it in case it exceed capacity.

Query is based on the idea of finding the intersection between the query rectangle and the bounding rectangle, recursively:

Parent Node Bounding Rectangle: (0, 0, 100, 100)
Children Bounding Rectangles:
  Child 1: (0, 0, 50, 50)
  Child 2: (50, 0, 100, 50)
  Child 3: (0, 50, 50, 100)
  Child 4: (50, 50, 100, 100)

R-Tree vs. Quadtree

FeatureR-TreeQuadtree
SubdivisionBy bounding rectanglesBy fixed quadrants
AdaptabilityHighly dynamic, based on data sizeStatic structure
Use CaseComplex shapes (polygons, etc.)Points or small rectangles

Google S2 Geometry Library

Divides the Earth’s surface into hierarchical cells, offering a balance between geohashing and quadtrees.