general overview and design considerations
a database stores data in a way that enables efficient access, manipulation, and organization. while data can be stored naively - for example, by writing arrays to disk - retrieving or modifying it then requires scanning or rewriting large parts of the data. databases avoid this inefficiency by structuring and indexing data to reduce processing time and storage overhead for common operations like search, insertion, update, and deletion.
naive storage requires scanning an entire array to locate elements with a desired property. a database can improve on this by:
databases running on a single host are limited by cpu, memory, storage capacity, and risk of failure. scaling addresses these limits via distribution.
coordinator nodes maintain metadata about:
data identifiers may include an origin-node prefix. nodes often try to group elements by origin, which simplifies routing and allows partial routing tables listing which nodes hold data from which origins.
databases rely on data structures that optimize for their access patterns.
databases organize data into tables with typed columns.
a table with one column. useful for lists or logs.
a table with two columns, commonly representing key-value pairs. indexing the second column allows reverse lookup.
a table with multiple columns. selective indexing enables fast queries on relevant fields while saving space.
databases differ in how they are deployed and accessed.
structured query languages (e.g. sql) are often used for expressing database operations.