design of a general purpose database

v2018 currently work in progress

wip code on github


i want an embeddable database to store records without having to construct high-level query language strings for each query. the next best thing would probably be sqlite, but there i would waste resources with frequently building sql strings and having them parsed

graph-like relations in relational databases are usually modelled with tables (and custom naming schemes). it seems there is a more efficient way to store and query relations and it could be simpler to use with specialised built-in features

basic concept

the original idea

each distinct binary pattern to be stored gets an identifier and becomes a node

labelled relations between nodes are the basis of all data models

data changes are relation changes

downside: wasteful if most data patterns are distinct (because of binary pattern deduplication with a reverse index)

the common database design

tables for typed sets, dictionaries or records. one btree per table

indices for table columns in extra btrees

need to choose appropriate keys

extra tables for relations between specific types. one table for each type combination

table cells are modifiable

the v2018 design


every node has a unique identifier

typed to be strings, numbers, arrays or other custom record types

indices can be created and used for fields and field combinations

virtual type nodes only exist in relations and carry the data in the identifier


are labelled to give meaning to relations

have a direction

targets are optionally ordered by weight-like values

general features

direct programming language interface, no high-level query language

shared library written in c (embeddable database)

excluded features: re-order on read, grouping

lmdb for btrees (read optimised. is the best/fastest)

core btrees usable with all lmdb features (for example cursor prev/next/last/first/set-range), custom btrees possible



include type information

for cheap filtering of relation targets by specific types


new identifiers are monotonical increments

counter initialised on start from max identifier

type independent. maybe separate sequences for standard types. 64b ids are almost too small for this. bigger native c datatype needed


a special "null" node of type id must be available. for example the identifier zero

use cases for this are unspecified relation labels, sources or targets

node types

fields are accessed by column index and indirectly name

fixed size columns stored first to make use of predictable offsets

node data format: fixed-size-data ... variable-size-len variable-size-data ...

type info cache

contains information about types and associated indices

max type id size limited to 16 bit with initial implementation

to make type id array index ("arrays are the fastest hash tables")



field types

have integer ids

fixed length field type ids are even numbers, variable length field type ids are odd numbers

there are many possible fixed length integer and string types with ids and data size calculated from a formula

fixed length types


3b:size-exponent 1b:signed 4b:id-prefix:0000

data size in bits: 2 ** (size-exponent + 3)


id: 4b:size-exponent 4b:id-prefix:0010

data size in bits: 2 ** (size-exponent + 4)



variable length types




system: config-label [id/custom ...] -> data

format-label -> id-size id-type-size ordinal-size

type-label id -> name-len name field-count (field-type name-len name) ...

index-label type-id field-index ... -> ()


when a node is deleted then all relations that contain its id must be deleted

there must only be one relation for every combination of left, label and right. cyclic relations may be allowed. relational integrity may be ignored when creating relations to avoid existence checks

targets are stored ordered by ordinal value. ordinal values are stored only for the left to right direction to save space

by convention left to right corresponds to general to specific


left:node label:node -> ordinal:float right:node
right:node label:node -> left:node


separate btrees with data as the key and node id as the value

update on each insert of the associated node type using the same transaction

might use a hook mechanism available for extensions

programming interface (roughly)








evaluation of other databases


sqlite pros

multi platform, well tested, well documented, query optimiser, views, order on select, group, filtering sub-selects, triggers, sql-functions, integer primary keys are automatically auto increment,

sqlite cons

sqlite can use transactions only for insert/update/delete, not for example for schema changes. lmdb based sph-db wouldnt have that limitation

building sql strings requires quoting, escaping and generally protection against sql-injection. this adds another step where strings are analysed and possibly rewritten

a table where all fields make up the key might not be possible as efficiently. in lmdb btrees one could use duplicate keys or store all fields in the key without values

cant predict or guarantee order of values as it is stored

all types are variable length and all data fields are therefore prefixed with type information

checking if an index exists is not supported

no duplicate keys

might require copying of selected data for reading instead of offering pointers to it directly

there is always an integer row id, even when using for example strings as the primary key

shared features

create indices selectively for columns

filter table rows whose columns match column content and logical conditions

automatically find and use index

sph-db pros

better performance (fewer code paths, more minimalistic design, lmdb)

uses c data structures for queries instead of sql strings which need to be constructed by concatenating strings then parsed and evaluated

sph-db gives access to data without copying. preparator (select) -> row-positioner (next) -> column-accessor (get)

direct lmdb btree access support. custom key-value btrees can be added to the environment

no relation tables with specific layout and indices needs to be managed

automatic deletion of dependent relations when deleting nodes instead of having to create triggers

low-level filter functions that can make range searches on compound key parts and dont have to analyse table schemas or find indices

graph databases

graph databases typically store relations in an adjacency list, which tends to be efficient for traversing long paths from node to node. but adjacency lists alone arent particularly well suited for frequent random access to nodes as if they are records or finding all relations a node is in. sph-db uses b+trees instead because its focus is on mapping data to identifiers (random access) and filtering on the set of all relations (optimised for queries such as "is x related to y" and "which nodes are adjacent to x" but not optimised for long path traversals)


schema changes require cache updates

cache update should only be made after the mdb transaction committed

option - selected

make schema changes have their own transaction each time


less complex


lower performance for a high frequency of schema changes. but is a significant amount at all likely


track schema updates with transaction and apply after successful mdb commit

inserts while type deletion


delete/insert might both happen on the same or concurrent transactions

data for a deleted type could be inserted


data becomes inaccessible and wastes space

if type id is ever reused, previously existing data becomes accessible again. this can be confusing in the new data model. sensitive data can become re-accessible with inadequate treatment

option - first choice

as long as type id re-use is not implemented, ignore the issue for initial versions

option - second choice

relieve integrity issues and delete excess data on demand


delete all data with type id when a new type is created

old data might still waste space but can never be confused with new data


add a garbage collection helper that deletes all data for type ids that are unused


low complexity, performant solution


prevent excess type data from ever existing


register inserted types in transaction

acquire schema lock before each commit with inserts or type delete

before commit with inserts check if all inserted types still exist


mutex locking/unlocking on every insert

extra complexity to track cross-transaction conflicts

index access after deletion

index open -> index delete -> index read -> error

all index handles become invalid with delete

how to choose and use indices


either selecting only node ids or nodes

cursor to make custom btree searches

database handles or one open database per process

option - selected


can use multiple databases in one process

almost all api routines require a transaction object which encapsulates the environment



dont have to pass handle argument to any api routine

must use separate processes and ipc to work with multiple databases

custom extra fields for relations

changing the default btree would add considerable complexity

possible with labels that are compound data structure nodes

use as key-value database

possible with lmdb features

use as traditional record/table database

possible with typed nodes

should partial indices be supported

not yet

clustering same-type nodes

archieved by having the type at the beginning of the key and having sorted keys

complex filter conditions for node selection

for example conditions in a nestable linked list with comparison operators. and/or/not/begins/ends/contains/equals/etc

or custom tests and branches

more complex searchers could be an extension

con custom

index selection cant come from conditions

elevating the process to higher-level languages exacerbates costs

pro custom

hardcoded checks. no interpretation of a condition query structure

where to encode the type of each individual node


store type in identifier


increased node identifier length. affects every node. affects every relation times four (source and target in two directions)


node id is enough to know the type. selecting just relations allows to filter by node type before accessing the nodes

cheap type filtering

relations between all types possible in one btree. no type combination-specific btree selection/creation process or need to access multiple btrees for a query


store type with the data


store type in data. makes get-all-of-type-x more difficult. possibly needs additional index

duplicated with each node

have to read data to find type


smaller relations


group nodes types by using separate btrees


types need to be carried with node references to find relevant grouping btrees and read record fields

one btree for each type combination necessary. has to be foaund for queries, possible multiple have to be accessed for different types


space saving not every row needs to store type information

informal column types

the database only needs a fixed and a variable length binary record field datatype to store data

informal datatypes are string, integer and similar, anything beyond binary

are stored with the schema to not avoid having to store it somewhere else for reifying useful datatypes

unidirectional relations to save space

maybe later

issue: how to garbage collect unidirectional relations

option (favorite): go by label. label must be connected

option: maybe decay factor and regular refresh but it should completely be an application issue at this point

database network distribution

i have started thinking about it, but it takes more resources to implement

rudimentary implementation (read scaling): a hook for each modification that gets passed the current write transaction and an extension that manages a commit log and applies it on another host

see database distribution

the common database design

tables for typed sets, dictionaries or records. one btree per table

indices for table columns in extra btrees

extra tables for relations between rows of different tables/types. one table for each type combination

table cells are modifiable

custom id sizes per table and sequencing. could load current max value per table on startup

have to choose the relation btrees to use for each relation query

type has to be carried anyway with ids

one btree for all tables or one for each table

one for each table because space saving. acts like a prefix to a data collection


should label/ordinal be optional

should graph relations of both directions be kept in the sys-info cache

are extra relation fields needed

example situation: "when every ingredient can be added in different amounts to different recipes"

custom primary keys

a dictionary, a one to one association, could be considered a subset of plain tables. yet rows might have row identifiers identifying a pairing even if that is not needed. this points to custom primary keys

ambiguity of auto-increment and provided keys. each insert requires additional checks if auto-increment and eventually if key has been provided. id key potentially quicker to join as primary and foreign key because efficient data type

sequences per table or one global


fast load on startup


can theoretically be shorter

need to query every table on startup

good for many fragmented types with local cluster relations

variable per-table

size loaded from schema variable

what happens on concurrent delete and update for a row


update is reinsert

last committed transaction wins

that is why updates are problematic

"update might prevent deletion"

how to create custom virtual types

set the corresponding flag with db-type-new

how to select type ids by name

option - current choice

cache data scan

no extra work

relatively fast operation


index btree


system data scan

how to alter a type

initially not supported

planned to require complete rewrite of all rows unless append

reason is maximum space saving and performance for the row encoding

an option might be to create new types for the changed versions,

use multiple as if one and merge at some point (might never become fully merged)

one sequence for all types or one per type

64 bit keys seem desirable because it is the biggest native c datatype. but with that size and a reasonable 16 bit type identifier, only 48 bit are left for row ids

initialise counters in array. type id equals index

at startup, for each type, find max id

selecting from multiple indices

intersection of multiple index search results

are more node operations needed

node first/last/prev/range etc seem not needed because of the data model (ids to for the user unsorted data)

but should allow first/last forward/reverse searches

multiple rows or one result at a time with node-next

difference to graph read: less filters


read returns next one - selected

add to state current match and return

set state to current row data (id, size, pointer)


copy only id and data pointer to result

custom row content matching can stop at find (main reason for choice)


repeated call cost when reading multiple rows

repeated call initialisation (declare status/val-null/val-id/val-data, return status)

may need to repeatedly extract and cache values from state


read returns list of multiple

create list of result data references

:: state count -> row-list


malloc for each row result struct

when filtering rows one by one more results than needed might have been loaded


less function calls, eventually less preparation to continue search (no re-allocation of cursor, status and mdb-val local variables)

works the same as for relations. less likely to be changed for relations at this point

timely more separated data read and use step separates repeated processes and might improve low-level caching

how sqlite does it

to read all fields: col-count, index, type, one of multiple type getters

state contains pointer to current data

state can be advanced without copying or accessing data

select syntax can vary col count


get-type(state, col-index)

choose get-value function then get-value(state, col-index)

how to filter with node-select

options: by id, by type, by matcher

type filter

option - selected

equal only

one type to match

multiple calls possible for searching different types

matcher function will differ anyway per type

easily extendable to any matcher if needed


most needed option



list of types to match


needs new list type

row data filter

option - selected

optional supplied matcher function

call of matcher function for each existing row

vs returning and allowing re-call


saves node-next call costs


built-in matchers for fields

probably special match structure/config needed


considerable extra complexity

combinations (ids, type):

0 0: all nodes of any type. unsupported

ids type: all nodes of ids and type. unsupported

ids 0: nodes of ids

0 type: all nodes of type

i dont see a purpose for 00 and 11 filters, so i tend towards having it unsupported for now

what would one be searching for with 0 0? to match algorithmically you need shared properties

how could it be useful to filter by ids and type, since ids contain the type. it is just filtering the plain list and checking existence

allow mixed-type node-select

for example with an ids list read from graph relations

usually databases cant select cross type, they can also not read fields cross type

always possible to go type by type with separate searches

also it would add a feature that might be lost with a slightly different base design

checking for type after match is cheaper than calling node-select for each type

option - selected

call db select once with mixed-type ids then check type of results


malloc new lists separated by type

call db select for each


call db select multiple times with mixed-type ids and specify a type filter for each

in effect filtering the list multiple times but not reallocating

node values datatype

option - selected

array of field count length


easy to check if data set and overwrite


types with many fields and few fields set waste memory


linked-list with field offset in element


saves space by storing only the fields that are set


costly to check if data already set, but perhaps not necessary

conversion to btree data needs to check if field set or not

malloc per element

should graph-read be changed to read one-by-one and not lists


seems everything is already matched at read


saves malloc per element

why implement db-node-exists

use case: externally retrieved ids list, for example page id or file ids, quick check if even exists before loading details

field value get error handling

error cases

requested field index out of bounds

result memory allocation error

option - selected

on error default value for type

done so by sqlite


return status-t

copy or reference selected strings

option - selected

reference or copy using different functions



might allow for writes into the database

invalid on transaction end


no copy if just searching

how to identify indices

option - selected

type-id and field-offsets


more data required

longer btree string name


id numbers would have to be found by specifying type and fields anyway


monotonically increasing id


less complex btree name creation (uint->string)


exhausts system sequence

schema changes and synchronisation

other active transactions might be trying to use the schema and for example insert data of types that are deleted

option - selected

make schema changes have their own transaction each time


less complex


lower performance for many schema changes. but schema changes arent predicted to be frequent enough


allow schema changes and data changes with the same transaction

add schema updates to be done to transaction handle and apply after successful mdb commit

implementation search performance

nodes and relations are stored in b+trees with a basic o(log n) time complexity for search/insert/delete (average and worst case)

node data

translating between node identifier and data requires only one basic b+tree search. it is as fast as it gets. data fields can be indexed


relations are filtered by giving lists of ids to match as arguments. 16 different filter combinations for left/label/ordinal/right, the parts of relations, are theoretically possible with dg_relation_select. 2 are unsupported: all combinations that contain a filter for right and ordinal but not left. because the number of these possible combinations is relatively low, pre-compiled code optimised for the exact combination is used when executing queries. the sph-db relation functionality is specifically designed for queries with these combinations

here is a list of estimated relative dg_relation_read performance for each filter combination from best (fastest execution per element matched) to worst (slowest), some are equal:

* * * *
left * * *
* * * right
left label * *
* label * right
* * ordinal *
left * ordinal *
left * ordinal right
left label ordinal *
left label ordinal right
left label * right
left * * right
* label * *
* label ordinal *

not supported

* * ordinal right
* label ordinal right

implementation space usage

nodes and relations are stored in b+trees with a basic o(log n) space complexity (average and worst case)

the required space depends on the chosen dg_id_t and dg_ordinal_t types. the maximum possible size is restricted by the largest available c type supported by the compiler that is used to compile sph-db. the commonly available range is 0 to 64 bit. the minimum possible size is 8 bit for identifiers and 0 bit for ordinals (untested)


type id: id-size

other types: id-size + data-size

virtual: 0 (only exist in relations)


6 * id-size + ordinal-size

example with 64 32

old calculation with bi-directional ordinal

id-size = 64, ordinal-size = 32

relation-size = 6 * 64 + 2 * 32 = 416

416 bit per relation

3e9 relations: ~1.3 terabit of storage

3e6 relations: ~1.3 gigabit of storage

example with 32 0

old calculation with bi-directional ordinal

id-size = 32, ordinal-size = 0

relation-size = 6 * 32 + 2 * 0 = 192

192 bit per relation

3e9 relations: ~576 gigabit

3e6 relations: ~576 megabit



after exploring designs for a more traditional table based database, the codebase of sph-db became relevant again after it was found that table based databases lead to complexities with node/row types and relations between them. particularly the fact that the node/row type has to be always known anyway and the complexity of having multiple relation tables and selecting them seemed convincing to come back to have the type encoded in the node id and store all in the same btree again, even if that costs significantly more space especially with relations


relations were relatively efficient to store but it became clear that structured data is not. low-complexity records with selective indexing and full table searches were not possible. only unnecessarily costly records could be created, for example using relations with nothing less than maximum normalisation and full indexes for the complete row data. tables, column indexes and custom key-value associations seemed missing

insights: separate btrees are like prefixing a data area and more space saving than prefixing each element with a type for example. when using one big btree to relate any type with any other, relations need to store the type of related nodes when the datatype is required to address elements. the row is the more fundamental abstraction, not the relations/graph. rows are like arrays with eventually variable-length and typed fields

custom node/row/record types and limiting which nodes can be linked has interesting implications for data retrieval and storage space


earlier versions of sph-db did not support labelled relations, but instead had relations that themselves have ids and identifier aliases. this was found to be overcomplicated and particularly too costly in terms of space usage and processing. one insight was that relations without labels are like relations with all having the same label. these relations are on their own not distinguishable semantically, especially for partitioning data and for deciding if specific relations are not needed anymore. relations between numbers posed another problem: since only a node with data type was specified, creating relations to numbers would potentially lead to a very high number of nodes. to solve this, a new node type that contains a number in the identifier and only exists in relations was introduced. earlier versions also used classifications like "shape" or "connectors" for nodes, but these classifications turned out to be useless at least in this context

applied knowledge

i previously wrote sph-dg, which is an embeddable, lmdb based database focused on graph-like relations. it is used on this website, works reliable and relation queries are fast. i am a long-time user of database systems like sqlite, mysql, lmdb (interface similar to berkeley db), oracle databases and postgresql and have used almost all sql features professionally. ive been contemplating for years about more simplistic and more efficient multi-use information storage and retrieval for contexts like web applications, generative art and ai knowledge storage


about databases

database distribution

graph data structures



sph-dg data modeling


tags: start q1 specification computer sph-dg graph database design software