this document outlines the design principles and implementation strategies for a distributed database system capable of automatically utilizing the combined storage space and processing power of multiple hosts. the primary goals of this system are:
the intended setup involves installing the software package on each host, specifying the address of at least one seed host, and starting the network daemon application.
the commit log serves as a critical component for maintaining data consistency across the distributed system.
mtime
) and the last commit id.timestamp | commit_id | change_type | element_id | data mtime -> last_commit_id
change types:
00
: create
01
: delete
10
: update
this structure enables efficient change tracking and synchronization between hosts.
managing host identifiers is crucial for coordinating operations and maintaining system integrity.
host lifecycle management:
offline hosts: hosts that remain offline beyond a certain threshold lose their identifier range.
rejoining hosts: reappearing hosts must advertise any changes, such as new ip addresses, and gain network consensus for reacceptance.
effective data distribution is essential for performance optimization and resource utilization.
partitioning:
replication:
deletion protocol:
consensus deletion: deleting an item requires agreement from all hosts responsible for the relevant partitions to ensure data integrity.
background processing:
state management:
remote states: handling of remote transaction states and cursors is necessary to track query progress and results.
socket management:
network messaging format:
standardization: defining a consistent messaging protocol for inter-host communication, possibly using formats like protocol buffers or grpc.
the ni is a shared data structure that maintains network topology and host status information.
states:
write
: host is available for write operations.
read
: host is available for read operations.
inactive
: host is temporarily unavailable.
proposal
: host is proposing changes or updates to the network configuration.
voting mechanism:
messaging attributes:
decay: a counter that decreases over time to manage the lifespan of messages in the network.
resource information:
partition ids: identifiers of data partitions managed by the host.
data ranges: ranges of data identifiers allocated to the host.
free space and cpu rating: metrics for load balancing and task allocation.
role determination:
commit log alignment:
status advertising: hosts with out-of-sync commit logs advertise their inactive
status.
full resynchronization: hosts request a complete data resync to realign with the network state.
directional metrics:
direction(a, c) = arccos((ac^2 + ab^2 * bc^2) / (2 * ac * ab))
metrics:
unavailable
.dynamic adjustments:
score reduction: decreasing unreliability scores after successful communications.
status updates: regularly updating host statuses based on current unreiability scores.
retry logic:
historical data:
since-time tracking: monitoring unreliability scores over time to detect patterns.
considerations:
inter-host relations: elements may need to reference others stored on different hosts.
identifier size: balancing uniqueness with efficiency; large identifiers like uuids may be impractical.
record type uniqueness: determining whether record type ids need to be globally unique or can be localized.
identifier mapping:
partitioned sequences:
type-specific b-trees:
separate structures: using individual b-trees for each record type to improve organization and search efficiency.
parallel processing:
data isolation:
separate lmdb instances: using multiple lightning memory-mapped database (lmdb) instances to isolate workloads.
hierarchical structure:
operational dynamics:
task delegation: leaders assign tasks and manage data distribution among followers.
fault tolerance: hierarchical clustering enhances resilience against host failures.
sequential resource utilization:
simplified management:
ease of implementation: linear structure simplifies routing logic and host management.
b-tree limitations:
data prefixing strategies:
application-level partitioning:
client-server architecture:
data locality:
optimal distribution: while keeping related data together is beneficial, distribution is sometimes necessary for scalability.
partition assignments:
data clustering:
mandatory distribution:
network mesh formation:
neighbor awareness: hosts need to be aware of neighboring hosts to form a resilient network mesh.
concept:
drawbacks:
single point of failure: the unavailability of the registry cluster halts identifier allocation.
state loss: if the cluster is lost, the entire identifier range and allocation state may be unrecoverable.
scalability issues: centralization conflicts with the distributed nature of the system.
reasons for rejection: