2018-04-14

sph-dg data modeling

wip. how to store data with sph-dg

sph-dg is currently supposed to be extended with custom node record types, making data modelling much more practical. the following is for the current/past version

pros and cons

sph-dg is good for data that should be stored with an identifier to use as a reference and to load it again when necessary. the data is automatically deduplicated and one identifier corresponds to one distinct binary pattern. a way to better create key-value associations just like in a key-value database and records is work in progress 2017-10-19

sph-dg is also good for storing and filtering nested sets and filtering entries in hierarchies like filesystem structures, particularly if there are many files per directory

general

intern data is distinct: the same data will always have the same id. a node can not be in relation to the same data multiple times, neither having a different nor the same id

extern data is not distinct: a node can be related to multiple extern nodes that all reference identical copies of the same data. as with interns, there can not be multiple relations to the same node

caveat: the data for string nodes might overlap with the data of other nodes. this can be problematic if multiple data types are allowed for labels. to guarantee that there is no overlap, data must be consistently typed, for example with type bits in front of the data. this does not apply to sph-dg-guile, which types all data by default from scheme data types

relations always include both sides: if there is a relation between x and y then the relation y to x exists with the opposite direction

notation

nodes

words in place of ids

"content" in place of ids of nodes with string data

relations

left right
left label right
left label ordinal right

direction

> for a left to right relation

< for a right to left relation

relations

> id element

since there can not be multiple relations between nodes, the elements related to a node in one direction form a set relative to that node

for pratical purposes, when thinking about what relations to create, the relations in one direction can be considered to be stored sorted by the numerical value of their parts. this corresponds to how it is actually stored

the order priority for relations to the right is as follows from left (highest order priority) to right (lowest order priority)

left label ordinal right

for relations to the left

right label ordinal left

relations with the same node to the left or right are stored together. multiple elements that are near each other are faster to read in sequence than elements that are sorted further away from each other

numbers, strings, arrays, binary data

can be stored using nodes of type intern. sph-dg will persist the data and return an id for it that can be used to load the data again. intern nodes can also be used to store an array of node ids as well as any custom record format

dictionary

this is the main data structure to emulate most kinds of collections, sets and dictionaries with. dictionaries have a set of keys and each key has a set of values

dictionary-id keys values
id "username" value-1
id "email" value-2

code example

(let
  ( (dictionary-id (dg-id-create 1))
    (key (dg-intern-ensure (list "username")))
    (value (dg-intern-ensure (list "tester"))))
  
  ; add an entry to the dictionary
  (dg-relation-ensure dictionary-id value key)
  
  ; get all values for a key from the dictionary
  (dg-relation-select-read dictionary-id #f key))

global dictionary

dg-null keys values
dg-null "list" id
dg-null "item" id

can be used to categorise nodes without adding them to a specific collection

dg-null is exported by sph-dg as a variable and has the value 0

examples

list with list items (actually a set because there can be no duplicates)

list-id element-type element
list-id "item" item-id-1
list-id "item" item-id-2
list-id "item" item-id-3

records of type "item"

null "item" item-n

tagging

word "tag" content
"tag-1" "tag" content-1
"tag-2" "tag" content-2
"tag-5" "tag" content-8

tags: computer data-structure sph-dg graph database model