# fast delimiter-based path extraction from standard input an algorithm for extracting delimiter-separated path strings # use case reading a sequence of path strings from standard input where: * each path ends with a delimiter character * newlines or other separator characters may be used as delimiters * memory is allocated dynamically to handle unknown input size * efficient in-place storage and referencing of paths is desired # the algorithm * input buffering: * allocate an initial buffer of fixed size to hold the input * read from standard input in chunks, appending to the buffer * dynamically resize the buffer as needed using realloc * reading stops at end of input (eof) or on read error * path pointer array: * allocate an initial array of pointers to hold extracted path references * resize the array dynamically as more paths are discovered * delimiter parsing: * iterate through the input buffer to locate delimiter characters * for each delimiter found: * if the region before the delimiter is non-empty, store a pointer to its start * replace the delimiter with a null terminator to isolate the path string * advance to the character following the delimiter * stop if no more delimiters are found or buffer end is reached * memory layout: * all extracted path strings are stored null-terminated in a contiguous buffer * the array of pointers references each of these strings directly within the buffer * the caller is responsible for freeing the returned buffer and pointer array # behavior and constraints * all paths must be terminated by the specified delimiter * empty entries (zero-length between delimiters) are ignored * the function exits silently if no input is received * memory errors invoke a termination routine (memory_error) * read errors invoke a termination routine (handle_error) # efficiency considerations * memory allocations are performed using exponential growth for both buffer and pointer array, reducing realloc overhead * only one pass is made through the input buffer to identify all paths * delimiter replacement with null bytes avoids copying path data and enables in-place storage * minimal dynamic memory usage beyond the input and pointer array # conclusion this algorithm provides an efficient way to: * extract and reference variable-length, delimiter-separated paths from standard input * minimize memory copying through in-place modification * scale to large inputs through dynamic resizing * maintain low overhead and deterministic behavior in a simple, single-pass design # example implementation in c ~~~ char* get_paths(char delimiter, char*** paths, size_t* paths_used) { // read delimiter separated paths from standard input. // all paths must end with the delimiter. // data will contain the full string read from standard input with newlines replaced by null bytes, // while paths contains pointers to the beginning of each path in data. size_t data_size = paths_data_size_min; size_t data_used = 0; size_t paths_size = paths_size_min; ssize_t read_size; char* data = malloc(paths_data_size_min); char* p; char* data_end; if (!data) memory_error; *paths = malloc(paths_size_min * sizeof(char*)); if (!*paths) memory_error; *paths_used = 0; while (0 < (read_size = read(0, data + data_used, paths_data_size_min))) { data_used += read_size; if (data_size < data_used + paths_data_size_min) { data_size *= 2; data = realloc(data, data_size); if (!data) memory_error; } } handle_error(read_size); if (!data_used) exit(0); p = data; data_end = data + data_used; while (p < data_end) { char* next_delim = memchr(p, delimiter, data_end - p); if (!next_delim) break; if (next_delim > p) { if (paths_size == *paths_used) { paths_size *= 2; *paths = realloc(*paths, paths_size * sizeof(char*)); if (!*paths) memory_error; } *next_delim = 0; (*paths)[*paths_used] = p; (*paths_used) += 1; } p = next_delim + 1; } return(data); } ~~~