2018-06-01

c programming

the uncommon way

single compilation unit

use a single main file that includes all necessary code

benefits

only one source file has to be compiled

easier to manage because not many different objects are compiled with many c compiler calls and then linked

makefiles dont have to configure many objects and their dependencies

few header files needed

potentially improved optimisation because the source code is fully available at once to the compiler

downsides

included files share scope (static and extern modifiers)

if only one source file changes, all code has to be recompiled

the traditional way is to compile parts of the application separately, maintain header files with declarations for each and then link the resulting object files where needed. this may save time in the development of big projects when most objects have previously been compiled and only a few have changed and need to be recompiled. it probably of no use to a user that just wants to compile the code, has to compile all of it, and is faced with a neverending log of compiler calls and difficult to understand makefiles

further information: wikipedia article. sqlite uses this style. this style is similar to javascript without modules where all source files and dependencies are included before use at the beginning of an html file

return status and error handling

use a status variable with an object that has status id and group id that is repeatedly updated and checked and on error use goto to an exit label and the end of the routine where all cleanup is done

about the status object: a program may use other libraries and each library can have its own error code range and the ranges can overlap. compared to just a status integer, the additional group id or source library id allows to distinguish between libraries and get the correct error message for example

moving clean-up to the end after a goto label can save code duplication of clean-up code where errors occur. there are no different clean-up actions necessary where errors occur. gotos in c are local to the current routine

general overview of error handling strategies: error handling

example

this example uses a reference implementation from sph-sc-lib

status_init declares a local variable "status_t status = {0, 0};", the other status_ bindings use that variable

status_require checks that status.id is status_success (0) and goes to exit if not

#include "sph/status.c"

status_t test() {
  status_init;
  if (1 < 2) {
    int group_id = 123;
    int error_id = 456;
    status_set_both_goto(group_id, error_id);
  }
exit:
  return status;
}

int main() {
  status_init;
  // code ...
  status_require(test());
  // more code ...
exit:
  return status.id;
}

memory management

use locally tracked memory

makes it easier to free all allocations up to point easily

example

this example uses a reference implementation from sph-sc-lib

local_memory_init(size) allocates an address register on the stack

local_memory_add(address) adds a pointer to the register

local_memory_free() frees all pointers added so far

#include "sph/local-memory.c"

int main() {
  local_memory_init(2);
  int* data_a = malloc(12 * sizeof(int));
  if(!data_a) goto exit;  // have to free nothing
  local_memory_add(data_a);
  // more code ...
  char* data_b = malloc(20 * sizeof(char));
  if(!data_b) goto exit;  // have to free "data_a"
  local_memory_add(data_b);
  // ...
  if (is_error) goto exit;  // have to free "data_a" and "data_b"
  // ...
exit:
  local_memory_free();
  return(0);
}

general

heap memory is requested when needed and gets reserved for the program (allocation). if the reservation is not ended when the memory is not needed anymore (deallocation), then the memory consumption of a process can grow continually over time. this is called memory leak.

this prevents programs from running indefinitely, as at some point all the available memory will be reserved and the program can not request more

each allocation must be followed by an explicit deallocation at some point in the execution of the program or implicitly with the end of the process, where all process memory is freed automatically

tools like valgrind can help to trace and find memory leaks

calling free on a null pointer: not a problem. calling free on a pointer with an already freed address: error

heap and stack

the stack is memory space that is reserved automatically for the extend of a routine call, for example to store routine arguments and local variables. it has a pre-calculated, limited or fixed size. heap memory is all other available system memory

life time

the c compiler has no indication of when memory is not needed anymore. how long a memory area is needed may depend on arbitrary conditions. references to the memory area can be passed through routines and persist across the whole program

at allocation, decide in which routine and when the memory is going to be freed in normal execution and with error handling

it might be helpful to think in terms of ownership - seeing specific routines as owner of memory and passing on ownership and the responsibility of deallocation

example cases

routine returns a pointer and the developer needs to choose a place to free the memory manually eventually (delegates the responsibility for the reservation)

routine receives memory and frees it at some point (takes over the responsibility for the reservation)

non-local jumps or exceptions occur and the flow of execution moves to other routines in the program with different context. deallocation must happen beforehand or allocations tracked otherwise

output arguments

routines can pass values to the caller in two ways: via return and via references (output arguments)

with output arguments, memory addresses (of stack or heap memory) are passed via pointer parameters to a routine. the routine then saves results at the referenced locations. in this way multiple result values can be created

project directory structure

examples here

prefer local variables to a set of globals

it might save declaration overhead, but access of a local is often faster because the compiler can better predict where it is modified and prepare to cache it

performance example

global

0m10.745s

0m10.739s

local

0m9.931s

0m9.940s

hygienic macros

the c preprocessor doesnt support hygienic macros. that means macro functions can introduce new bound identifiers, use and modify variables from the current scope

information encapsualtion

c does not have a module system and routines can generally not be defined inside a limited scope inside a compilation unit. there is a notion of file scope but for this separate objects have to be compiled and linked

the most basic helpful thing is perhaps namespacing with semantic identifier construction. construct identifiers from words with prefixes, for example "modulename-routinename", to group them from generic to the specific

other solutions in existence include module systems that pre-process c-code to rewrite identifiers or add blocks or similar

routine structure

all declarations

set values / make calls / etc

exit label

cleanup

return

type names

types can be of platform dependent variable size (with or without explicit minimum/maximum size) or fixed size

the standard types int, char, and more have a platform dependent variable size with a minimum required size. type prefixes (long, long long, short, short short) are used to specify different minimum size requirements. c data types on wikipedia

there are standard fixed size data types that are usually defined in stdint.h and included with inttypes.h. for example int32_t, uint8_t and more. they dont need obscure type prefixes. inttypes.h also defines minimum size and maximum size limited types (int_least32_t, intmax_t, etc) as well as a fast type which is guaranteed to be the fastest available type on a platform with a minimum size

shorter type names

in sph code, shortened fixed type name aliases have been used

b0, b8, b32, b64, b8_s, b32_s, b64_s, f32_s, f64_s, pointer

description: lowercase b is the unit symbol for bit. the names put size first, then extended content properties like signedness. one downside is that there is no mention of integers, which isnt ideal considering that bits do not just have to be usable like integer values

c type definition

links


tags: programming start q1 document guide pattern computer c design