2018-10-29

c programming

the uncommon way

single compilation unit

use a single main file that includes all necessary code

benefits

  • only one source file has to be specified for compilation
  • less complex and easier to manage because not many different objects are compiled with many c compiler calls and then linked
  • makefiles dont have to be written for many objects and their dependencies
  • few header files needed
  • potentially improved optimisation because the source code is fully available at once to the compiler

downsides

  • included files share scope (static and extern modifiers)
  • if only one source file changes, all code has to be recompiled

the traditional way is to compile parts of the application separately into objects, maintain header files with declarations for each and then link to the object files where needed. this may save time in the development of big projects when most objects have previously been compiled and only a few have changed and need to be recompiled. it is probably of no use to a user that just wants to compile the code once and has to compile all of it and is faced with a neverending log of compiler calls and difficult to understand makefiles

further information: wikipedia article . sqlite uses this style. this style is similar to javascript in html without module systems where all source files and dependencies are included before use at the beginning of an html file

return status and error handling

  • use a status variable with an object that has status id and group id that is checked for a failure status and use goto to an exit label at the end of the routine where all cleanup is done. gotos in c are local to the current routine. it is a bit like local exceptions
  • the status id is the error or general status code, the group id is a string name of the library the code belongs to. when multiple libraries are used it would otherwise be almost impossible to synchronise error codes so that they are not ambiguous. string names are used because integer ids would again be hard to keep unambiguous
  • doing all clean-up at the end of the routine can save code deduplication and often requires values to be initialised to a null value before they have been used so the clean-up part can know for example not to free an unallocated pointer

example

  • this example uses a reference implementation from sph-sc-lib
  • status_declare declares a local variable status_t status = {0, ""};, the other status_* bindings use that variable
  • status_require checks that status.id is status_id_success, which is zero, and goes to exit if not
#include "sph/status.c"

status_t test() {
  status_declare;
  if (1 < 2) {
    status_set_both_goto("mylib", 456);
  }
exit:
  return status;
}

int main() {
  status_init;
  // code ...
  status_require(test());
  // more code ...
exit:
  return status.id;
}

free current memory allocations at point

track allocations locally

example

this example uses a reference implementation from sph-sc-lib . sph-sc-lib also contains a version with multiple named registers and a register to be passed between routines memreg_init(4) creates an address register on the stack for at most four pointers memreg_register is the variable and memreg_index is the current index memreg_add(address) adds a pointer to the register memreg_free frees all pointers added so far


#include "sph/memreg.c"
int main() {
  memreg_init(2);
  int* data_a = malloc(12 * sizeof(int));
  if(!data_a) goto exit;  // have to free nothing
  memreg_add(data_a);
  // more code ...
  char* data_b = malloc(20 * sizeof(char));
  if(!data_b) goto exit;  // have to free "data_a"
  memreg_add(data_b);
  // ...
  if (is_error) goto exit;  // have to free "data_a" and "data_b"
  // ...
exit:
  memreg_free;
  return(0);
}

memory management in general

memory leaks

  • heap memory is requested when needed and then gets reserved for the program (allocation). if the reservation is not ended when the memory is not needed anymore (deallocation), then the memory will always be used and the memory consumption of a process can grow continually over time with more allocations. this is called memory leak
  • this prevents programs from running indefinitely
  • each allocation must, as some point in the execution, be followed by a deallocation. all memory is released implicitly with the end of the process
  • tools like valgrind can help to trace and find memory leaks

null pointers

a null pointer can be created with setting a pointer to literal zero. calling free on a null pointer is allowed

double free and corruption

  • calling free on a pointer whose address has previously been freed usually leads to an error
  • corruption can occur when the program haphazardly wrote into the memory outside of the allocated range. this can mess up management structures of the allocator and is a common problem and attack area for security exploits

heap and stack

the stack is memory space that is reserved for the extend of a routine call, for example to store routine arguments and local variables. it has a pre-calculated, limited or fixed size. heap memory is all other available system memory

life time

  • the c compiler has no indication of when memory is not needed anymore. how long a memory area is needed may depend on arbitrary conditions. references to the memory area can be passed through routines and persist across the whole program
  • at allocation, decide when the memory is going to be freed in normal execution and with error handling
  • it might be helpful to think in terms of ownership - seeing specific routines as owner of memory and passing on ownership and the responsibility of deallocation

example cases

  • routine returns pointer, developer needs to choose place to free the memory (callee delegates the ownership for the reservation to caller)
  • routine receives memory and frees it at some point (callee takes over ownership)
  • with non-local jumps or exceptions the flow of execution moves to other routines in the program with different context. deallocation must happen beforehand or references are lost

call by value

when arguments are copied with a routine call it prevents the routine from changing state in the caller scope and no thought has to be given to the question if outside execution changes or depends on the values

output arguments

  • routines can pass values to the caller in two ways: via return and via references
  • often pointer arguments are given that are only used to take the result value. in this way a routine can return the error status with the return and multiple other values with the output arguments at once

argument order

output arguments last, acted-on arguments first

  • string-append :: a b result
  • list-add :: list value

prefer local variables to a set of globals

it might save declaration overhead, but access of a local is often faster because the compiler can better predict where it is modified and prepare to cache values

performance example

global

0m10.745s 0m10.739s

local

0m9.931s 0m9.940s

hygienic macros

the c preprocessor doesnt support hygienic macros. that means macro functions can introduce newly bound identifiers and use and modify variables from the current scope

information encapsualtion

  • c does not have a module system and routines can generally not be defined inside a limited scope in a single compilation unit. there is a notion of file scope but for this separate objects with headers have to be compiled and linked
  • the most basic helpful thing is perhaps namespacing with semantic identifier construction. construct identifiers from words with prefixes, for example "modulename-routinename", to group them from generic to the specific
  • other solutions in existence include module systems that pre-process c-code to rewrite identifiers or add blocks or similar

routine structure

all stack allocations are being made at the beginning of a routine anyway and having all declarations at the beginning groups this type of preparation, so it might make sense to have all declarations at the top

type names

  • types can be of platform dependent variable size or fixed size
  • the standard types int, char, and more have a platform dependent variable size with a minimum required size. type prefixes (long, long long, short, short short) are used to specify different minimum size requirements. c data types on wikipedia
  • there are standard fixed size data types that are usually defined in stdint.h and included with inttypes.h. for example int32_t, uint8_t and more. they dont take strange type prefixes. inttypes.h also defines minimum size and maximum size limited types (int_least32_t, intmax_t, etc) as well as a fast type which is guaranteed to be the fastest available type on a platform with a minimum size

shorter type names

here are some alternative type names that could be used

i8, i16, i32, i64, i8-least, i16-least, i32-least, i64-least, i8-fast, i16-fast, i32-fast, i64-fast, ui8, ui16, ui32, ui64, ui8, ui16-least, ui32-least, ui64-least, ui8-fast, ui16-fast, ui32-fast, ui64-fast, f32 float, f64 double, pointer, boolean

incidentally, the rust language actually uses some of them

links