2025-10-19

miscellaneous

part of c programming

single compilation unit

  • a single compilation unit builds one binary from a single translation unit that includes all source files
  • the main file may use .h or .c; .c is typical when it contains main
  • suitable for small projects
  • benefits

    • full inlining and dead-code elimination
    • no linkage overhead
    • dependency-local optimization via compiler visibility
  • separate-compilation model

    • compiles each source file to an object file
    • uses headers for declarations and a linker to resolve references
    • reduces recompilation time but adds build complexity
    • can inhibit whole-program optimization
  • sqlite is a well-known example using a single compilation unit
  • including c files resembles embedding javascript in html

tips and tricks

  • structs

    • offsetof() can compute the start of a struct from a field pointer
    • struct assignment copies all contained data, including nested arrays
    • initialization with = {0} sets all fields to zero
  • functions

    • assign output arguments only on success
    • function calls are expressions; they can wrap complex statements
    • for variable argument counts, prefer _n-suffixed variants (e.g., list_3(2,3,4))
  • preprocessor

    • use #ifndef guards to allow conditional redefinition for configuration
    • macros cannot take other macro names as arguments to form concatenated identifiers
  • strings and chars: plain may be signed or unsigned; use to control default
  • math: can be slower than integer truncation or bit operations
  • allocation pattern

    • initialize pointers to zero
    • allocate resources
    • free nonzero pointers in cleanup
  • terminology

    • parameter: variable in function declaration or definition

    • argument: actual value passed at call site

memory-mapped files

  • memory-buffer functions can operate directly on file content via mmap
  • mmap()

    • used internally by malloc on linux as a low-level allocator

    • maps files into virtual memory pages

    • page faults occur when accessing unloaded regions

example

file_buffer = mmap(0, file_size, PROT_READ, MAP_SHARED, fd, 0);
md5((unsigned char*)file_buffer, file_size, result);
munmap(file_buffer, file_size);

data types

  • types can be understood as a memory space requirement and semantics for accessing the memory
  • for example, an unsigned integer type and a pointer might store a identical sequence of bits, yet pointers increment in address step sizes
  • the standard types int, char, and more have a platform dependent variable size with a minimum required size. type prefixes (long, long long, short, short short) are available to adjust the minimum size requirements. c data types on wikipedia
  • there are also standard fixed size data types that are usually defined in the file stdint.h and included with inttypes.h. these types are for example int32_t, uint8_t and more. they do not take the somewhat strange "long long" and related type prefixes. inttypes.h also defines minimum size and maximum size limited types (int_least32_t, intmax_t, etc) as well as a fast type which is guaranteed to be the fastest available type on a platform of a minimum size

shorter type names

here are some alternative type names that could be used:

i8 i16 i32 i64 i8_least i16_least i32_least i64_least
i8_fast i16_fast i32_fast i64_fast u8 u16 u32 u64 u8
u16_least u32_least u64_least u8_fast u16_fast u32_fast
u64_fast f32 float f64 double pointer boolean

incidentally, makers of the rust language had the same idea and are using some of these type names

data structures

flat vs nested multidimensional arrays

  • flat arrays use one contiguous block

    • simpler allocation
    • fixed subarray size
    • manual index calculation: i = d1*(d2*d3)+d2*d3+d3
    • can interleave layouts for performance
  • nested arrays

    • each subarray separately allocated

    • simpler indexing and iteration

    • requires multiple allocations and frees

structs containing arrays and size fields

  • pros

    • eliminates separate size variables
    • simplifies passing arrays to functions
  • cons

    • adds struct setup overhead

    • limits pointer arithmetic flexibility

    • shared size across arrays requires extra handling

    • not always useful if function already accepts a count argument

linked lists

  • typical implementation allocates one node per element
  • insertion and removal modify neighbor pointers only
  • suitable for frequent insert/delete but cache-inefficient for traversal

dynamic arrays

  • track logical length and allocated capacity separately
  • resizing policies

    • grow on each add; simple but may reallocate often
    • ensure capacity in advance; reduces checks during additions
    • adding beyond capacity causes overflow unless resized
  • pros: easy length modification and random growth
  • cons

    • reallocation requires pointer updates

    • operations can fail on allocation

struct padding

  • alignment rules

    • each member aligns to a multiple of its size
    • compiler may add padding between members or at struct end
  • examples

    • small before large adds padding

    • order members from largest to smallest to reduce padding

    • arrays follow base-type alignment

    • struct size padded to largest-member alignment

c vs higher-level languages

  • greater type specificity

    • each type combination requires distinct functions
    • macro templating is verbose
  • explicit memory management: allocation and cleanup interleave with logic
  • low-level control: more flexibility but more performance-sensitive decisions
  • verbosity

    • more declarations and boilerplate

    • fine-grained control often offsets the cost in clarity and performance

shipping custom header versions

  • using code can include the shipped headers during compilation, but both the library and the using code must include the same header versions
  • headers inside the library often include other internal headers through fixed relative or absolute include paths. these internal include references cannot be redirected easily to alternative versions used by external code
  • rewriting internal include paths or renaming symbols would be required to ship multiple interchangeable header versions

code inclusion patterns

  • header-only inclusion

    • declarations and inline definitions only
    • requires no shared library at link time unless external symbols are referenced
  • header plus shared library

    • header provides public declarations
    • compiled library object provides the corresponding definitions
    • user code includes the header and links against the shared object
  • separate header and c file inclusion

    • each compiled into distinct object files and linked together
    • standard pattern for modular builds
  • single translation unit

    • all .c files included into one top-level .c for compact projects
    • avoids linker complexity but requires the compilation of all code on update instead of only changed objects
  • repeated inclusion under different macros

    • same source file re-included with varying macro values to generate variants
    • used for template-style expansion
  • self-include pattern: each includes its own header to ensure declaration-definition consistency

dynamic memory resize patterns

  • resize pattern

    • takes required size n as argument
    • if current capacity is smaller, grow by either

      • allocating the difference n - available, or
      • applying a growth factor (e.g., ×1.5 or ×2)
  • ensure pattern

    • ensures allocation of at least n elements

    • allocates new storage if pointer is null

    • check input before calling ensure to decide if additional initialization on new allocation is needed

performance and correctness

goals

  • cache locality
  • summation stability

printf brittleness

stdio.h printf is fickle when it comes to matching format tags with arguments. it uses c standard variadic arguments, which have it read arguments based on the format tag associated size from an argument bytestream.

if the format tag is not matching the type, the printing will be corrupted.

printf does not have string type tags for fixed length types from inttypes.h. macros are the only solution currently

printf("64 bit %" PRIu64 " printed here\n" u64variable);

a more robust printf might:

  • use generic string tags for inttypes.h types
  • automatically cast arguments to match the format type