# array representations ## address continuation - contiguous - elements occupy a single continuous address interval - address computation is purely arithmetic - address(i) = base + i * stride - no metadata required beyond base and stride - relocation requires copying the entire region - discontinuous - elements are stored in multiple, non-adjacent regions - address computation requires at least one indirection - typical realizations - pointer arrays - block directories - stable slot tables - enables growth and removal without relocating existing elements - trades locality and simplicity for stability and flexibility ## stability contracts - pointer stable - once an element address is observed, it remains valid until explicit destruction - prohibits moving live elements in memory - required when external code caches element pointers - severely restricts resizing strategies in contiguous layouts - index stable - once an index is assigned, it permanently identifies the same logical element - index becomes an identity rather than a position - requires indirection, tombstones, or fixed slots to prevent shifts - enables removal without invalidating references by index ## size model - static - capacity fixed at initialization - no reallocation or structural mutation of storage - size changes only by modifying a usage counter - simplest model with strongest predictability - dynamic - capacity may change during lifetime - growth strategy defines amortized cost and fragmentation - shrink strategy determines memory return behavior - resizing may or may not preserve pointer stability depending on contract ## element type and layout - single type - one interpretation of memory, one stride - simplest aliasing and correctness model - nested or multidimensional - arrays represent higher dimensional index spaces - flat representation - one contiguous allocation - index mapping function computes linear offset - hierarchical representation - arrays of pointers to subarrays - enables ragged shapes at cost of indirection - mixed type - memory region accessed through multiple type interpretations - requires explicit control over layout and aliasing - common mechanisms - structs to define composite layout - byte-wise access for reinterpretation - incorrect aliasing leads to undefined behavior ## operations and stability impact - resize - changes capacity and possibly physical location - contiguous dynamic arrays may relocate storage - relocation breaks pointer stability but preserves index ordering - append - adds element at logical end - contiguous layouts preserve relative ordering - amortized cost depends on growth policy - insert - adds element at arbitrary logical position - contiguous layouts require shifting or gaps - shifting breaks pointer and index stability beyond insertion point - removal - deletes an element and frees its slot - strategy defines semantic cost - shift removal preserves order, breaks stability - swap removal preserves density, breaks identity - tombstones preserve identity, reduce density ## format choices - array of structs - each element stores all fields of a logical record - one base pointer, one stride - optimal for per-record access - less efficient for field-wise bulk processing - struct of arrays - each field stored in a separate array - multiple base pointers - optimal for bulk and vectorized operations - per-record access requires multiple loads ## minimal representation patterns - static contiguous - fixed buffer plus usage count - no allocation after initialization - dynamic contiguous - buffer with explicit capacity and used counters - growth policy defines relocation behavior - stable slots contiguous - contiguous slot array - slot identity is stable - removal creates empty slots - reuse via free list - iteration must test slot occupancy ## discontinuous designs - fixed blocks - storage divided into equal-sized contiguous blocks - blocks never moved once allocated - index mapping computes block index and offset - preserves pointer stability within blocks - unequal block sizes - blocks grow geometrically - reduces allocation frequency - requires cumulative size metadata for indexing - per-element allocation - each element allocated independently - maximal pointer stability - minimal spatial locality - high allocator metadata overhead - stable slots - logical index refers to a slot, not a position - slot identity is preserved across removals - empty slots accumulate unless reused - iteration must skip empty slots - reuse is mediated by a free list ### free list designs - stack-based free list - free slots are recorded externally in a separate stack or array - removal pushes slot index onto the stack - insertion pops a slot index from the stack - properties - slot metadata does not contaminate element storage - free list state is centralized - requires separate storage proportional to number of free slots - removal and insertion are constant time - consequences - element storage can remain type-pure - free list capacity must be managed independently - embedded free list - free slots store linkage metadata in place of element data - each free slot contains a reference to the next free slot - a single head index tracks the free list - properties - no external free list storage required - slot storage is type-polluted while free - free list state is distributed across slots - consequences - requires a representation for empty versus occupied slots - element type must tolerate overlay or union-based encoding - iteration must distinguish live elements from linkage metadata - comparison - stack-based - clearer separation of data and bookkeeping - higher auxiliary memory cost - embedded - minimal auxiliary memory - tighter coupling between storage and allocator logic ## example: dynamic growth-only with removals and index stability - contract - indices represent permanent element identities - pointer stability not guaranteed across growth - design - contiguous slot array with embedded free list - growth only increases capacity - insertion - reuse freed slots before extending used range - removal - mark slot empty - link slot into free list - iteration - scan full used range - skip empty slots - compaction - optional operation - requires emitting index remapping ## example: 3d point lists - array of structs - struct { float x; float y; float z; } - spatially local per-point access - inefficient for operations on a single coordinate - struct of arrays - separate x, y, z arrays - efficient for coordinate-wise transforms - higher cost for per-point access - dimensions can be optional by using a null pointers for arrays