Building a Graph Engine -- Modeling Graph: Vertexes, Edges and their relations

The graph engine was designed to build graph data models based on key-value schema, because the engine use identifier list in the vertex to make connections to other vertex or edges. It is efficient to fetch data according to the identifier of a key-value pair from a hash map than do any traditional search in an array. Every unit, including edges and vertex are store as key value pair in the memory store in a very compact way (see this article). In a data explore process, it should be deliver much more better performance than traditional relational databases. It is also flexible and able to model hyper graph, that one edge may have multiply vertex inbound and outbound.

For performance concern, how to present edges in vertex is an interesting topic. The problem is there may be many different kind on edges. For example, when we model a school roster, one student should have multiply kinds of relation to other matters like his class, his friends, his teachers, this scores. Usually we only need to know one kind of his relation like how many friends he has. If we put every edges in one list, it will be too much overhead to extract all of the edges that was linked with the vertex to find only one type of edges. One solution is to define the possible edge types in the schema like what we see how Trinity models movie and actor graph.Snip20160228_2But when the datasets increasing, one entity of a vertex may have more relations and edges to other vertexes over time, model edges like this will leads to alter the cell schema for the vertex frequently, which also leads to rebuild the whole data sets of the the schema in the store. I have studied neo4j and OrientDB, finally decided to develop my own approach, to store array map in vertex for each type of edges to a identifier to edge list cell. The edge list cell contains all of the edges of this type related to the vertex. A vertex contains 3 such array map for inbound, outbound and indirected edges. It can also contains more fields for other data that was defined in the vertex schema.
IMG_0053  Edge list cell has only one field, that is the list of the identifiers to other vertex or edges. For best performance, the engine allows both vertex id (for simple edges) or edge/hyper edges with additional document id for more complex models. The engine can determinate the type of the cell based on it's schema id from the store.IMG_0054_

After designed such structure for graph model, I found that the store still have limitations in custom types like array map. I need to dedicate more time on improving the key-value store, supporting nested schema will suit my needs.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.