2019-06-25

c programming

the uncommon way

subtopics

single compilation unit

for relatively small projects, prefer a single main file that includes necessary code

the traditional way is to compile parts of the application separately into machine code objects, maintain header files with declarations for each object and then use a linker to connect the code in object files. this may save time in the development of big projects when most objects have previously been compiled and only a few have changed and need to be recompiled.

further information: wikipedia article . sqlite uses this style. this style is similar to javascript in html without module systems where all source files and dependencies are included before use at the beginning of an html file

benefits

  • only one object has to be compiled and only a single call to the compiler is needed to compile the main file
  • no complicated makefiles have to be written and maintained
  • application parts are not split into many header files
  • potential for more automatic code optimisation

downsides

  • all included bindings are defined for all the following code and conflicts become likely, because c has no namespacing feature
  • all code has to be recompiled if one source file changes, which can take a long time

return status and error handling

  • one can use a status variable with an object that has status id and group id that is checked for a failure status and use goto to an exit label at the end of the routine where all cleanup is done. gotos in c are local to the current routine. it is a bit like local exceptions
  • the status id is the error or general status code, the group id is a string name of the library the code belongs to. when multiple libraries are used it would otherwise be almost impossible to synchronise error codes so that they are not ambiguous. string names are used because integer ids would again be hard to keep unambiguous
  • doing all clean-up at the end of the routine can save code deduplication and often requires values to be initialised to a null value before they have been used so the clean-up part can know for example not to free an unallocated pointer

example

  • this example uses a reference implementation from sph-sc-lib
  • status_declare declares a local variable status_t status = {0, ""};, the other status_* bindings use that variable
  • status_require checks that status.id is status_id_success, which is zero, and goes to exit if not
#include "sph/status.c"

status_t test() {
  status_declare;
  if (1 < 2) {
    status_set_both_goto("mylib", 456);
  }
exit:
  return status;
}

int main() {
  status_init;
  // code ...
  status_require(test());
  // more code ...
exit:
  return status.id;
}

important features missing in c

  • namespaces: controlling scope of bindings

    • bindings declared on the top-level of a file exist for all following code, even if the file was included
    • included library code cannot hide helper code
    • cant rename generically named bindings on import
    • only way to namespace is prefixing, which leads to long identifiers
    • cant bind namespace identifiers to short local names
    • the only way to get function definitions into a scope are compiling them to another compatible format - linked machine code in shared libraries
    • cant safely have generically named exports in libraries
    • any code before inclusion may affect any following included library
    • tiny libraries are impractical because of the shared library overhead
  • names for values and shadowing: variables are associated with memory space, but there is no good way to just associate a name with an expression to not have to repeat it. there is the preprocessor, but it has a different syntax from c with lines that have to be moved to the line beginning and it doesnt shadow variables, you cant control for conflicts adequately
  • keyword arguments: particularly useful for optional arguments. scheme has lambda* and javascript uses its object notation to the same extent
  • anonymous functions: to pass procedural information as an argument, like abstracting the inside of a for loop, or for temporary functions generally. currently one has to define a separate global function and use a function pointer. returning anonymous function points might not be desired as it might require closures
  • memory ownership semantics
  • the preprocessor cant generate a variable length of expressions. for example, it cant generate multiple expressions from variable length arguments
  • to prevent a file from being included more than once the file content has to be enclosed with a preprocessor if-expression or alternatively preceeded by a pragma-once
  • symbols: literal character based identifiers. string literals need string comparison and number variables need declaration. enums are probably the next best thing
  • the c preprocessor doesnt support hygienic macros. that means macro functions can introduce newly bound identifiers and use and modify variables from the current scope. macros could be more useful if they could use temporary variables that cant conflict with the surrounding code they are used at

namepaces

renaming bindings is not possible as any alias requires a reference to the original in scope if c had namespaces, there might be less need for binary modules, as the c code could be included without conflict. this would be similar to other languages like javascript, where modules are just included code

current options

  • wait till one day it is added to the c standard
  • compile as cpp and use its namespace syntax. see also dotc
  • parse c including its preprocessor and add namespace syntax with rewriting identifier names at definition and use - hide unexported bindings, eventually rename exported bindings
  • compile a shared library binary object and use a header file and linker to use it in other code. clang modules work like this and it is common practice for modularising c code. limiting exports from a shared library needs an extra exports file or code annotations. all exports, including exposed types, have to be declared in the header file and users cant rename them without changing the source

free current memory allocations at point

track allocations locally

example

this example uses a reference implementation from sph-sc-lib . sph-sc-lib also contains a version with multiple named registers and a register to be passed between routines memreg_init(4) creates an address register on the stack for at most four pointers memreg_register is the variable and memreg_index is the current index memreg_add(address) adds a pointer to the register memreg_free frees all pointers added so far


#include "sph/memreg.c"
int main() {
  memreg_init(2);
  int* data_a = malloc(12 * sizeof(int));
  if(!data_a) goto exit;  // have to free nothing
  memreg_add(data_a);
  // more code ...
  char* data_b = malloc(20 * sizeof(char));
  if(!data_b) goto exit;  // have to free "data_a"
  memreg_add(data_b);
  // ...
  if (is_error) goto exit;  // have to free "data_a" and "data_b"
  // ...
exit:
  memreg_free;
  return(0);
}

memory management in general

memory leaks

  • heap memory is requested when needed and then gets reserved for the program (allocation). if the reservation is not ended when the memory is not needed anymore (deallocation), then the memory will stay reserved unusable for the program and the memory consumption of a process can grow continually over time with more allocations. this is called memory leak
  • it prevents programs from running for an indefinite amount of time
  • each allocation must, as some point in the execution, be followed by a deallocation. all memory is released implicitly with the end of the process
  • tools like valgrind can help to trace and find memory leaks

null pointers

a null pointer can be created with setting a pointer to literal zero. calling free on a null pointer is allowed

double free and corruption

  • calling free on a pointer whose address has previously been freed is an error
  • memory corruption can occur when the program haphazardly wrote into the memory outside of the allocated range. this can mess up management structures of the allocator and is a common problem and attack area for security exploits

heap and stack

the stack is memory space that is reserved for the extent of a routine call, for example to store routine arguments and local variables. it has a pre-calculated, limited or fixed size. heap memory is all other available system memory. variables with stack memory only need to be declared, variables with heap memory need to be declared as pointers and the heap memory separately allocated

life time

  • the c compiler has no indication of when memory is not needed anymore. how long a memory area is needed may depend on arbitrary conditions. references to the memory area can be passed through routines and persist between routine calls and across the whole program
  • at allocation, decide when the memory is going to be freed in normal execution and with error handling
  • it might be helpful to think in terms of ownership - seeing specific routines as owner of memory and passing on ownership and the responsibility of deallocation

example cases

  • routine returns pointer, developer needs to choose place to free the memory (callee delegates the ownership for the reservation to caller)
  • routine receives memory and frees it at some point (callee takes over ownership)
  • with non-local jumps or exceptions the flow of execution moves to other routines in the program with different context. deallocation must happen beforehand or references are lost

call by value

when arguments are copied with a routine call it prevents the routine from changing state in the caller scope and no thought has to be given to the question if outside execution changes or depends on the values

output arguments

  • routines can pass values to the caller in two ways: via return and via references
  • often pointer arguments are given that are only used to take the result value. in this way a routine can return the error status with the return and multiple other values with the output arguments at once
  • with output arguments and error handling via return, as it is usually done, functions are quite different from lambdas in functional programming

argument order

output arguments last, acted-on arguments first

  • string-append :: a b result
  • list-add :: list value

prefer local variables to a set of globals

it might save declaration overhead, but access of a local is often faster because the compiler can better predict where it is modified and prepare to cache values

performance example

global

0m10.745s 0m10.739s

local

0m9.931s 0m9.940s

routine structure

all stack allocations are being made at the beginning of a routine anyway and having all declarations at the beginning groups this type of preparation, so it might make sense to have all declarations at the top

type names

  • types can be of platform dependent variable size or fixed size
  • the standard types int, char, and more have a platform dependent variable size with a minimum required size. type prefixes (long, long long, short, short short) are used to specify different minimum size requirements. c data types on wikipedia
  • there are standard fixed size data types that are usually defined in stdint.h and included with inttypes.h. for example int32_t, uint8_t and more. they dont take strange type prefixes. inttypes.h also defines minimum size and maximum size limited types (int_least32_t, intmax_t, etc) as well as a fast type which is guaranteed to be the fastest available type on a platform with a minimum size

shorter type names

here are some alternative type names that could be used

i8, i16, i32, i64, i8-least, i16-least, i32-least, i64-least, i8-fast, i16-fast, i32-fast, i64-fast, u8, u16, u32, u64, u8, u16-least, u32-least, u64-least, u8-fast, u16-fast, u32-fast, u64-fast, f32 float, f64 double, pointer, boolean

incidentally, the rust language actually uses some of them

links

c tricks

  • using offsetof to get a pointer to the structure from only a pointer to the struct field