Authored and Versioned Content on Holochain
First — What is Holochain?
Holochain is not a block chain; it’s an open source p2p application data framework. It puts agents (users) at the center of its design philosophy instead of data. This means that it isn’t opinionated towards consensus, which makes it multiple orders of magnitude faster and more energy efficient than blockchain, among other benefits like users having more control over their data. You can read so much more about it on its website:
Now, while some applications really don’t need consensus at all, with some outside the box thinking it starts to become clear that most applications don’t actually need consensus. In this article I’m going to present the sketch of a design for a content data model that has both authors and versioning built in, like git (which is a great example of a highly used, complex software use case that is built to work without consensus).
Without explaining the whole framework, the basic things you need to know to make some sense of this design are:
- In Holochain, each “client” is also a node in the network that provides storage and compute for everyone. A good design makes sure hashes are spread out evenly to make sure all nodes are equally shouldering that load.
- Holochain itself handles all of the work necessary to connect and gossip with peers; we are just concerned with creating a data model that uses as little storage and network bandwidth as possible (because there’s no data harvesting corporation footing the bill; the users are all self hosting!)
- The Distributed Hash Table (DHT) that serves as the database for Holochain is content addressable. That means if two people post “Hello world” to the DHT, they would both get back hashes that point to the same data. I rely on this for determinism/monotonicity in some parts of the design, and to reduce storage use (especially around content).
- Holochain has single-directional links that can be attached to any element on the DHT, pointing to another element using its hash. These can also be annotated with tags that are just bytes. I imply the usage of these heavily in this design.
The Problem
For my project Library (a 3D knowledge graph browser), I need the nodes in the knowledge graph to be editable and versioned, with the author of each edit tracked as well. This seems simple enough, but with multiple users and no centralized server to coordinate, we quickly run into problems with conflicts and branches.
The Solution
This design solution allows for individuals to make local changes to content (stored as deltas with their fingerprint), and then propose their changes to be recognized as latest in a version chain, along with a simple governance structure for deciding on which changes a group wants to adopt as latest.
There are a handful of sub-patterns involved in this solution. Some patterns may have more widely used terms that I don’t know, so feel free to point me towards them when applicable.
The components are:
Users and Content
Agent
This one is very simple, just a user’s public key. Whenever I mention an agent or user, I just mean their public key. There’s no additional data we need other than that identifier.
Content
This is just raw content. In a basic use case this could be json, or an image file, but it could be anything in this pattern. The important part is that there is no unnecessary metadata, since we want the hash of “the same content” to always be the same on the distributed hash table (DHT) for storage concerns.
Delta
The contents of a Delta is an author along with a list of changes. What constitutes a “change” is up to your implementation; in a basic use case, this would probably some json schema for insertions and deletions. (The Delta might also be able to support a list of authors if a single Delta has multiple authors, but I sketch it out here with one author per Delta.) The Delta has a link to the previous delta in the chain that it is appended to, and has an optional link to a Content object that represents the state of the Content after the changes in that Delta are applied to the previous state. This means that every change does not necessarily result in a full new Content object being committed to the DHT; it can be up to program logic (or even the user) when to actually create that “compiled” Content object.
If you ever have a Delta that does not have a linked Content object, you can follow the chain backwards until you find a Delta with a compiled object and then construct the current state by the chain of changes to the last complied Content. In the diagram above, the circles of “implied content” mean that there is a deterministic Content block that can exist there, but that it hasn’t been committed to the DHT. The Content being deterministic means that if we make a Delta that “reverts” to a previous state, it would essentially just create a pointer back to an existing Content object. This structure makes it so that we store as little data as needed on the DHT: only the small changes made between versions, and as few full “compiled” objects as is necessary for our app.
Versioning
Governance
The Governance object will come into play in the Snapshot object. Essentially, a Governance object is composed of a list of Agents, along with monotonic “voting” rules. The monotonic requirement is that resolving the results of a “vote” cannot change based on limited information, and that the results can not be ambiguous. For example, let’s say we have two agents in a Governance object: they can not each be given 50% voting power, because then a vote can be tied. (However, a more complex Governance object may be able to be designed for failed votes, or even ranked choice; as long as it is still monotonic.) Another limitation would be that votes cannot be deleted or changed; if a vote was deleted and not yet saturated in the DHT, another user who casts a vote may determine a decision was reached even though one of the votes is now invalid. A user making a vote can use their local source chain to enforce they are only voting once.
So in a “happy path scenario”, we have a simple 3 member system with equal votes. As soon as someone makes a vote that meets the acceptance criteria, we can be sure that a decision was deterministically reached. In that scenario, agent 1 votes for A, and agent 2 also votes for A, agent 2 doesn’t have to wait to see what agent 3 voted for, because we already know that A must be the winner.
Snapshot
Since in a distributed system we can never be sure that two agents don’t simultaneously make a new Delta off of an existing Delta, branches are impossible to prevent (and also a good thing for the user). But unfortunately, this creates a technical problem, in that branches can occur. Without designing for this, branching content would quickly silo individuals into a non-collaborative environment. A Snapshot is where we really get into the weeds of solving this branching problem.
The core concept is that an individual Snapshot contains a hash of a Delta object, and a hash of a Governance object (and also optionally a link to the Content it represents). The Governance object acts as permissions for the Snapshot. Potentially, there may be different permissions for different actions (like different permissions for proposals vs approvals). Any qualified agent can propose a candidate for succession to the current Snapshot. Any qualified agent can also cast a single vote for one succession candidate. The votes can be stored as links to the candidates, with tags that mark it as a vote and also identifying the agent’s index from the Governance object’s agent list.
The cool part here is that since the Snapshot contains a Delta and also a Governance object, the governance of this chain of
As part of the validation rules for proposing a candidate, we can also ensure that the Delta contained in a succession candidate contains the current Delta in its chain, by navigating backwards through the new Delta’s chain. (This could be optional, depending on how strict your use case is on what constitutes a “version”.) And finally, once a succession candidate is chosen, we can use that opportunity to ensure that a Delta that is contained in a Snapshot has a concrete Content object (since we just validated that the chain is valid, we should also have all of those Deltas in memory to quickly generate the Content object if it doesn’t exist). We can also then link the Snapshot to that Content object.
So now, every Snapshot points to a Delta that create an unbroken chain, and each Snapshot points to a concrete Content object (though this could be optional). And since all of the votes are stored as links off of each Snapshot, these can be re-validated by anyone who wishes to do so. In fact, since voting may happen asynchronously, any individual vote may not reveal the winner of a decision at voting time, but might be discovered later on by someone who visited the node and saw an unresolved vote, and then checked its status. And if multiple people were to do this at the same time, the results would always be deterministic.
My hope is that these “Snapshot chains” would be used as a way for groups and individuals to “endorse” a certain perspective on information. This way, users can use good old-fashioned “human trust”, and institutions can perform curation services on those perspectives. (This is another difference between Holochain and blockchain; there isn’t always a need to seek “trustless” solutions to human problems—trust is built in to humanity, so use it!)
With this Snapshot design complete, we also now support branching! Anyone can take any Delta and create a new Snapshot from it with their own Governance. This makes it possible to branch when necessary, while other mechanisms (like social ones) can encourage collaboration in the majority use cases.
Version Chain-Tree
(Please help me find the right name for this)
Now we have one last problem. Let’s say you’re looking at a particular Snapshot, and you see there is a new version (by the presence of a link to a new version on it). You can go to the next version, but you’d also really like to be able to jump to the latest version. We could traverse those links until we get to the end, but what if it’s really long? We could also make it so that every time there is a new version, we go back through the chain and create a new link to the latest (and delete the old link to the latest), but this would scale even worse, both with link storage and also network traversal. What if an object had 100 changes? or 1,000,000 changes? Neither of these solutions scale well.
To solve this, I came up with this chain-tree structure that grows from the bottom up. There would be a “root” node of the tree, which has a width and a unique hash (I do not yet know what I should use as the source of this hash… perhaps just the hash of the first object in the chain?). Each “node” is composed of that root hash, along with its height and index in the tree. In the diagram, the tree has a “width” of 3, so each node has up to three children. Once a node is full, it creates a new parent by incrementing the height, and then creates a neighbor linked from the parent by incrementing the index. Every time a new leaf-object is added to a node at height zero, it creates bidirectional links for next/previous between the last leaf and the new leaf. The root node also creates a link to the “top” of the tree (using its index) every time a new top is created.
With this structure, any node in the tree can jump to the root, then to the top, and then follow the highest index downward to find the latest. So in the example diagram, with a height and width of 3, there are 27 possible versions in the tree/chain (I only used 3 to make it plausible to draw; in practice the width would be much higher). Any given leaf object can traverse locally through the chain with next/previous links, but also, any leaf can get to the latest leaf with height + 2
link traversals. With this pattern, we can choose a width that works best with a “maximum amount of links desirable for an element on the DHT”, which I would imagine is much higher than 3, and then this makes a lot more sense. For example, with a width of 64, we could store ~260,000 versions with the same height of 3 in the diagram, giving us height (3) + 2 = 5
link hops from any node to the latest.
An alternative approach that doesn’t include this root node with a top link would be to just recurse up to a parent node until there isn’t one, and then follow the highest index down; the result would be the latest leaf node. This would require a maximum of 2 * height
link traversals to get to the latest, but it would be simpler. Either way would be fine I think. Both solutions have localized forward/backward traversal for any version in the chain, as well as an easy way to jump to the first and last leaf from any leaf.
When this Version Chain-Tree is used with the other components above, the “leaf objects” in this tree would be the Snapshots from above. The tree would grow from the lower left, starting with the green parent node of “L1”. Once that node is full, it will create its parent node, and then a new sibling which is the parent of “L4”. This pattern works recursively as the tree grows. This means the tree can support an unlimited number of items, and that as it gets exponentially large, the height of the tree doesn’t get much higher. With a sufficient width, you could fit reasonably almost any use case within a height of 5 or less.
Wrap Up
The result of combining all of these components gives us a lot of great functionality that respects users and their data, while also striking a balance between minimizing network resources, storage resources, and processing resources.
- Because we store the Deltas between Content versions, we have rich history of who made what changes to each piece of Content.
- Because not every Delta generates a new compiled set of Content on the DHT, we are conserving storage space for all of the users of the network, and trading it for small amounts of network usage of hopping the Delta links.
- Application logic or users themselves can choose how frequently it makes sense to generate a compiled Content object and store it on the DHT.
- Since we store all of the “versioning and authoring” metadata in the Delta element as opposed to the Content itself, we also take advantage of Holochain’s content-addressability to minimize duplicated data. The “same” Content will always have the same hash.
- With each Snapshot containing a pointer to a Delta and a Governance object, the rules that govern the Snapshot are able to evolve over time, along with the Content.
- By traversing backwards through a Delta chain, we can ensure that a new version of a Snapshot is actually a descendent of the previous version.
- By using the chain-tree structure, we can always navigate to local previous/next versions from any version, while also having relatively quick hops to first/latest from any version. It also grows logarithmically in terms of storage overhead, and link traversal overhead, and only grows as big as it needs.
With all of these benefits, we have a system that should be capable of supporting potentially billions of changes from any number of agents with minimal storage needs, minimal network traffic, and no need for consensus or definitive ordering of events.
The main drawback is that the Governance structure makes it difficult for massively managed Snapshots; it would be nearly impossible for millions of agents to vote for new candidates without getting stalled… though not completely impossible. This also might not be a problem in the real world at all: in a world of unenclosable carriers and endless forkability, we may end up managing our own knowledge graphs with small groups of friends or colleagues, or entrusting that stewardship to those with expertise on a certain topic.
If you’re new to Holochain or the decentralized web, I hope you found some of these ideas exciting and are ready to jump into more. If you’re not new to the space… let me know what you think! I reasoned through a lot of this on my own from first principles, but I’m certain that I’ve probably reinvented some wheels here; let me know who has already designed this better.