2024-07-03

possibly useful features missing in c

part of c programming

miscellaneous

  • literal assignment of multiple values to heap allocated arrays. int* a = malloc(3 * sizeof(int)); a = {1, 2, 3};
  • keyword arguments. particularly useful for optional arguments. scheme has lambda* and javascript uses its object notation to the same extent
  • defining arrays as types using simple identifiers. currently, the only option is to use structs with a single array field
  • names for values and shadowing: there is no really simple way to associate a name with an expression just for not having to repeat it. preprocessor definitions require a markedly different syntax, do not shadow variables, and are always global
  • symbols: literal, character-based identifiers. string literals need string comparison and variables with numeric values need extra declaration. enums are probably the next best thing but still require prior declaration and possibly usage of a dedicated type
  • a fractional number type for exact representations like 14/3
  • reflection features to infer pointer target types to allow macros like this: "allocate_memory(my_t_pointer)" -> "my_t_pointer = malloc(sizeof(my_t))"
  • a portable and robust include guard. currently, the whole file content has to be enclosed with a preprocessor if-expression or alternatively preceeded by a less portable pragma-once to prevent multiple inclusion. one reason for the current limitations is the difficulty compilers have in determining if two included files are actually the same file
  • anonymous functions: to pass procedural information and possibly dynamically create it. for example, to abstract the inside of a for-loop, or as an argument to functions that accept function pointers, for example a sorting function. currently, it is only possible to use top-level defined named functions for this or use less portable compound-literals. it is not so simple to create such functions dynamically

namespaces: to control scope of bindings

currently, every declaration on the top-level of a file exists for all following code, even when the declaring file was only included. included library code can not hide helpers and internal declarations, type definitions, and macros. not only can this lead to naming conflicts when the following code tries to use an already declared identifier, it could also change the behavior unwantedly by declaring or modifying used bindings.

this makes it difficult to share libraries because some bindings, even if they are only used internally, may conflict with other libraries or the including codebase.

c can not rename bindings on inclusion. for example, the addition of a prefix to names that are too generic. redefinition and aliases are also no solution, as aliases require a reference to the aliased in the same scope.

currently, a common solution is to build shared library objects with header files and additional compilation configuration. this adds overhead which makes small libraries less practical to use. another approach is to use prefixes in identifiers to reduce the likelihood of the same name existing in including code.

if c had namespaces, there might be less need for configuring and compiling separate binary objects like modules because small libraries could be included without conflict.

current options

  • wait till it is added to the c standard. it is unknown if that will ever happen
  • compile as cpp and use its namespace syntax. see also dotc
  • parse c and its preprocessor, modify the syntax tree by rewriting identifier names at definition and places of use to hide unexported bindings and possibly rename exported bindings, then convert back to c code. needs an easily usable c parser
  • compile shared library binary objects for each separate namespace and use header files and a linker to use it in other code. this hides only code that is not part of the header. this is currently the common approach. clang modules adds syntax for this pattern

memory ownership semantics

for many types of resource allocation (for example, heap memory, file handles, database and network connections) the deallocation should not be missed because memory leaks, inefficient resource usage, and resource exhaustion would otherwise be the consequence. to track the life-time of such resources inside, outside, and through functions tends to create a considerable burden on the developer.

many related patterns emerge, for example:

  • callee allocates and deallocates
  • callee receives and deallocates or reallocates
  • callee allocates and the continuation deallocates (at some point that is to be defined)
  • detection and deallocation of relevant resources acquired so far at early function returns, for example on error
  • deallocation by the operating system when the process ends

automatic deallocation and more explicit syntax for passing resource handles between functions could help make programming easier and programs considerably more robust.

see also ownership in the rust language.

preprocessor features

  • macros that can generate multiple expressions from variable length arguments
  • macros that do not need line escaping
  • option to declare variable names that are guaranteed to not conflict with the surrounding code they are used in. one option currently is to use unusual variable names and hope that they wont ever conflict and the macro is not used twice in the same scope
  • #if that can be used inside #define
  • macros that define macros