Graph access pattern are mostly random. To achieve best performance that was possible, it is wise to put everything in memory rather than disk-oriented store. Because DRAMs is much more faster and expensive than hard drives and SSDs, storage efficiency is the essence of the matter. One of my colleges from my previous employer used to developed his own graph database as well. He told me that his design hits memory limitation quickly which has low memory efficiency. His system was written in C++, the relation between nodes was represented in pointer and each node is an object. I cannot get more details on his implementation, but represent node and relations in native form like objects in object-oriented programming languages and maps is bad use case for efficiency, although it is easy for development. Especially for java, each allocated object took 12 bytes for header and additional bytes for alignment. If the object contains other objects, the memory overhead consumption stacks by the number of total objects allocated, which is not acceptable for in-memory applications.
Backend store for Morpheus is a in-memory distributed key-value store. It took some ideas from RAMCloud. Each instance contains numerous of trunks, each trunk contains one constant size of byte array, all of the data record need to be serialize to write and deserialize to read on the byte array for all the time. There is also a hash map as index for record id to the location of each record in the byte array in each trunk. Because the location of the record may change due to defragmentation, value in the index hash map may also change correspondly.
Trinity and most of the relational database require to define schema before insert new data record. It may not flexible compare to the modern database systems like MongoDB and CouchBase, but it is possible to deliver much more compact storage without wasting precious spaces on describing data structure for each records. Morpheus took the same idea for it's backend store, user need to define schema with name and data type for each fields. Fields need to in a sequence for each schema, the system will write fields data according to the schema leaves no gaps and any overhead on data types or names to the fields. The system can also reconstruct the record from bytes according to schemas.
Users may delete or update records in the store. There are certain circumstances that may leaves gaps or we called fragment in the bytes array:
- Delete: It is obvious that deleting records will erase the sections that the record took in the byte array. That will leave a gap start with the location of the record and it's size is the same as the record took. In this case, the system will mark the whole space as fragment.
- Update: There are three possible circumstances of updating a record in the trunk. The most easy one is the replacement has the same size of the original one, the system only need to replace the bytes with the new serialized record. Another is the replacement needs less space than the original. Then the system can still write the data to the original place but the rest spaces need to be record in the trunk as fragment. When the new record took more space than the original one, the system will write the new record at the append head and update the index like inserting other new records and mark the whole section of the original one as fragment just like delete the original record.
The backend store took care of both of the circumstances above which user can use it as a fully functional key-value store.
When the system was running for a long time, frequent delete and update operation may leave lots of fragments in the trunk, which leaves less spaces for new records to store. A defragmentation mechanism is required to resolve this issue by moving record sections to fill in the gaps and keep data integrity. After on turn of defragmentation, the append head should move backward and more records can fit in. Implementing non-pause defragmentation is difficult, I will address this issue later in another article.
Each record has a integer id, the backend store distributes the record to servers and trunks in servers according to the id and fetch the record by the id. I can still use distributed hashing table for this case, which is easy to accomplish shard-nothing architecture. I have already done this part in another of my project.