2025-12-20

fast delimiter-based path extraction from standard input

an algorithm for extracting delimiter-separated path strings

use case

reading a sequence of path strings from standard input where:

  • each path ends with a delimiter character
  • newlines or other separator characters may be used as delimiters
  • memory is allocated dynamically to handle unknown input size
  • efficient in-place storage and referencing of paths is desired

the algorithm

  • input buffering:

    • allocate an initial buffer of fixed size to hold the input

    • read from standard input in chunks, appending to the buffer

    • dynamically resize the buffer as needed using realloc

    • reading stops at end of input (eof) or on read error

  • path pointer array:

    • allocate an initial array of pointers to hold extracted path references

    • resize the array dynamically as more paths are discovered

  • delimiter parsing:

    • iterate through the input buffer to locate delimiter characters

    • for each delimiter found:

      • if the region before the delimiter is non-empty, store a pointer to its start
      • replace the delimiter with a null terminator to isolate the path string
      • advance to the character following the delimiter
    • stop if no more delimiters are found or buffer end is reached

  • memory layout:

    • all extracted path strings are stored null-terminated in a contiguous buffer

    • the array of pointers references each of these strings directly within the buffer

    • the caller is responsible for freeing the returned buffer and pointer array

behavior and constraints

  • all paths must be terminated by the specified delimiter
  • empty entries (zero-length between delimiters) are ignored
  • the function exits silently if no input is received
  • memory errors invoke a termination routine (memory_error)
  • read errors invoke a termination routine (handle_error)

efficiency considerations

  • memory allocations are performed using exponential growth for both buffer and pointer array, reducing realloc overhead
  • only one pass is made through the input buffer to identify all paths
  • delimiter replacement with null bytes avoids copying path data and enables in-place storage
  • minimal dynamic memory usage beyond the input and pointer array

conclusion

this algorithm provides an efficient way to:

  • extract and reference variable-length, delimiter-separated paths from standard input
  • minimize memory copying through in-place modification
  • scale to large inputs through dynamic resizing
  • maintain low overhead and deterministic behavior in a simple, single-pass design

example implementation

read_paths