2023-04-02

c programming

subtopics

single compilation unit

for relatively small projects, a single main file that includes all necessary code might be preferable to makefiles, headers and separate objects.

the more common way is to compile parts of the application separately into machine code objects, maintain header files with declarations for each object and then use a linker to connect the code in object files. this may save time in the development of big projects when most objects have previously been compiled and only a few have changed and need to be recompiled.

further information: wikipedia article. sqlite uses a single compilation unit. the style is similar to javascript in html without module systems where all source files and dependencies are included at the beginning of an html file before use.

benefits

  • only one headerfile is needed for linking
  • no complicated makefiles have to be written and maintained
  • only one object has to be compiled
  • only a single call to the compiler is needed
  • increased potential for automatic code optimizations

downsides

  • all included bindings are defined for all the following code and conflicts become likely, because c has no namespacing feature
  • all code has to be recompiled if one source file changes, which can take a long time

links

tips and tricks, hints

  • using offsetof to get a pointer to a structure given only a pointer to a struct field
  • function calls are expressions, so they can be used to wrap more complicated syntax into an expression
  • consider setting output variables only on success
  • for functions to accept a variable number of arguments, the preferred solution in c is often to define _n suffix variants that take a specific number of arguments. for example, list_3(2, 3, 4)
  • string literals are usually signed integers by default. -funsigned-char
  • c floor is slow
  • assignment of large struct copies all content automatically, even with nested array types, as long as the memory is declared as being part of the structure itself and not just a pointer
  • setting a struct to zero may be more complicate because a struct does not have a single value
  • with #ifndef, files can be created that support macro variables set before they are included, where the macro variables only get defined inside if undefined. this can be used for configuration. the files can be included multiple times with different macro variable values set before
  • pattern: initialize variable to zero, allocate heap memory, final function cleanup checks if allocated and frees if necessary
  • it is not usually possible to pass macro names to macros and construct other macro names from them

functions that work on memory buffers can work without change on file content using memory maps

file_buffer = mmap(0, file_size, PROT_READ, MAP_SHARED, file_descript, 0);
MD5((unsigned char*) file_buffer, file_size, result);
munmap(file_buffer, file_size);

data structures

flat multidimensional array vs array of arrays

[1 2 3 1 2 3] vs [[1 2 3] [1 2 3]]
  • it is possible to allocate one long array with n elements representing each sub array, or alternatively use an array of actual arrays of length n
  • flat

    • easier to allocate because only a single memory region is required
    • sub arrays are fixed size
    • the c type[][] arrays are of this format
    • if not using standard type[][] arrays

      • access with [sub_i * sub_size + i]
      • deeper nesting makes the indexing more complicated
    • a variant is to store sub arrays interleaved, for example [1 1 2 2 3 3]. this changes the indexing
  • nested

    • easier to access because sub arrays can be iterated with one incremented index and without having to incorporate the sub array size in the indexing calculation

    • easier to use with generic array operations, for example sorting

    • one array and every sub array has to be allocated separately and later freed

structs that store arrays and size

pro

  • no need declare additional size variable, as it is available with a single struct variable
  • no need to pass array size as an extra argument to functions, especially argument saving when the function takes multiple arrays

con

  • overall it is not that useful to save a size argument here and there compared to the complexity of preparing the struct variable, conversions, and type specific implementations
  • a separate size and pointer is easier to use for portions of arrays because of pointer arithmetic (+n when passed to a function for example)
  • a single size variable can be used for multiple arrays of the same size
  • sometimes a count of items that are to be processed is to be passed to functions. the size value in the struct may be unnecessary in this case. for example make_array(count, struct_size_and_data)

lists

  • a typical implementation needs one allocation per addition
  • insert and removal is simple because only the adjacent elements have to be modified, instead of shifting all elements as may be necessary when inserting or removing from an array

dynamic arrays

used size and total allocated size can be tracked separately. the total length of active elements can be reduced but elements can also be added up to the allocated size, and the allocated size can be automatically expanded, for example with realloc.

pro

  • length-variable or length-modifying operations are easier. for example, generating a random number of elements or reducing the number of elements.

there are multiple options for when to do resizing

  • option

    • define add functions and manage allocation size on each addition
    • possibly use add-n or ensure-n functions that reduce the number of necessary free space checks
    • addition can fail because of the possible allocation, so the error status has to be checked
  • option

    • manual ensure-n before usage, which resizes if necessary

    • no free space checks when adding

    • trying to add more than what was allocated for may lead to buffer overflows

c vs higher-level languages

the biggest slowdown i have experienced when programming in c versus other languages comes from having to be more specific:

  • specific with types - each function implementation works only for the exact type combinations it was defined for. functions that process uint32 and uint64 need two full separate definitions, float and uint functions may not be easily macro templated because of low-level details. macro templating with preprocessor syntax needs distracting line escaping
  • distraction by interleaved memory allocation code, as well as complicated ownership and cleanup semantics
  • specific in what is done - more low-level options, more possible variation, more performance implications
  • more has to be done. for example, declaration, allocation, deallocation, initialization, etc, but this fine grained control may be worth the cost