Technical specification for writing RedisGraph client libraries
By design, there is not a full standard for RedisGraph clients to adhere to. Areas such as pretty-print formatting, query validation, and transactional and multithreaded capabilities have no canonically correct behavior, and the implementer is free to choose the approach and complexity that suits them best. RedisGraph does, however, provide a compact result set format for clients that minimizes the amount of redundant data transmitted from the server. Implementers are encouraged to take advantage of this format, as it provides better performance and removes ambiguity from decoding certain data. This approach requires clients to be capable of issuing procedure calls to the server and performing a small amount of client-side caching.
Retrieving the compact result set
Appending the flag --compact
to any query issued to the GRAPH.QUERY endpoint will cause the server to issue results in the compact format. Because we don't store connection-specific configurations, all queries should be issued with this flag.
GRAPH.QUERY demo "MATCH (a) RETURN a" --compact
Formatting differences in the compact result set
Certain values are emitted as integer IDs rather than strings:
- Node labels
- Relationship types
- Property keys
Instructions on how to efficiently convert these IDs in the Procedure Calls section below.
Additionally, two enums are exposed:
ColumnType, which as of RedisGraph v2.1.0 will always be COLUMN_SCALAR
. This enum is retained for backwards compatibility, and may be ignored by the client unless RedisGraph versions older than v2.1.0 must be supported.
ValueType indicates the data type (such as Node, integer, or string) of each returned value. Each value is emitted as a 2-array, with this enum in the first position and the actual value in the second. Each property on a graph entity also has a scalar as its value, so this construction is nested in each value of the properties array when a column contains a node or relationship.
Decoding the result set
Given the graph created by the query:
GRAPH.QUERY demo "CREATE (:plant {name: 'Tree'})-[:GROWS {season: 'Autumn'}]->(:fruit {name: 'Apple'})"
Let's formulate a query that returns 3 columns: nodes, relationships, and scalars, in that order.
Verbose (default):
127.0.0.1:6379> GRAPH.QUERY demo "MATCH (a)-[e]->(b) RETURN a, e, b.name"
1) 1) "a"
2) "e"
3) "b.name"
2) 1) 1) 1) 1) "id"
2) (integer) 0
2) 1) "labels"
2) 1) "plant"
3) 1) "properties"
2) 1) 1) "name"
2) "Tree"
2) 1) 1) "id"
2) (integer) 0
2) 1) "type"
2) "GROWS"
3) 1) "src_node"
2) (integer) 0
4) 1) "dest_node"
2) (integer) 1
5) 1) "properties"
2) 1) 1) "season"
2) "Autumn"
3) "Apple"
3) 1) "Query internal execution time: 1.326905 milliseconds"
Compact:
127.0.0.1:6379> GRAPH.QUERY demo "MATCH (a)-[e]->(b) RETURN a, e, b.name" --compact
1) 1) 1) (integer) 1
2) "a"
2) 1) (integer) 1
2) "e"
3) 1) (integer) 1
2) "b.name"
2) 1) 1) 1) (integer) 8
2) 1) (integer) 0
2) 1) (integer) 0
3) 1) 1) (integer) 0
2) (integer) 2
3) "Tree"
2) 1) (integer) 7
2) 1) (integer) 0
2) (integer) 0
3) (integer) 0
4) (integer) 1
5) 1) 1) (integer) 1
2) (integer) 2
3) "Autumn"
3) 1) (integer) 2
2) "Apple"
3) 1) "Query internal execution time: 1.085412 milliseconds"
These results are being parsed by redis-cli
, which adds such visual cues as array indexing and indentation, as well as type hints like (integer)
. The actual data transmitted is formatted using the RESP protocol. All of the current RedisGraph clients rely upon a stable Redis client in the same language (such as redis-rb for Ruby) which handles RESP decoding.
Top-level array results
The result set above had 3 members in its top-level array:
1) Header row
2) Result rows
3) Query statistics
All queries that have a RETURN
clause will have these 3 members. Queries that don't return results have only one member in the outermost array, the query statistics:
127.0.0.1:6379> GRAPH.QUERY demo "CREATE (:plant {name: 'Tree'})-[:GROWS {season: 'Autumn'}]->(:fruit {name: 'Apple'})" --compact
1) 1) "Labels added: 2"
2) "Nodes created: 2"
3) "Properties set: 3"
4) "Relationships created: 1"
5) "Query internal execution time: 1.972868 milliseconds"
Rather than introspecting on the query being emitted, the client implementation can check whether this array contains 1 or 3 elements to choose how to format data.
Reading the header row
Our sample query MATCH (a)-[e]->(b) RETURN a, e, b.name
generated the header:
1) 1) (integer) 1
2) "a"
3) 1) (integer) 1
3) "e"
4) 1) (integer) 1
3) "b.name"
The 4 array members correspond, in order, to the 3 entities described in the RETURN clause.
Each is emitted as a 2-array:
1) ColumnType (enum)
2) column name (string)
The first element is the ColumnType enum, which as of RedisGraph v2.1.0 will always be COLUMN_SCALAR
. This element is retained for backwards compatibility, and may be ignored by the client unless RedisGraph versions older than v2.1.0 must be supported.
Reading result rows
The entity representations in this section will closely resemble those found in Result Set Graph Entities.
Our query produced one row of results with 3 columns (as described by the header):
1) 1) 1) (integer) 8
2) 1) (integer) 0
2) 1) (integer) 0
3) 1) 1) (integer) 0
2) (integer) 2
3) "Tree"
2) 1) (integer) 7
2) 1) (integer) 0
2) (integer) 0
3) (integer) 0
4) (integer) 1
5) 1) 1) (integer) 1
2) (integer) 2
3) "Autumn"
3) 1) (integer) 2
2) "Apple"
Each element is emitted as a 2-array - [ValueType
, value].
It is the client's responsibility to store the ValueType enum. RedisGraph guarantees that this enum may be extended in the future, but the existing values will not be altered.
The ValueType
for the first entry is VALUE_NODE
. The node representation contains 3 top-level elements:
- The node's internal ID.
- An array of all label IDs associated with the node (currently, each node can have either 0 or 1 labels, though this restriction may be lifted in the future).
- An array of all properties the node contains. Properties are represented as 3-arrays - [property key ID,
ValueType
, value].
[
Node ID (integer),
[label ID (integer) X label count]
[[property key ID (integer), ValueType (enum), value (scalar)] X property count]
]
The ValueType
for the first entry is VALUE_EDGE
. The edge representation differs from the node representation in two respects:
- Each relation has exactly one type, rather than the 0+ labels a node may have.
- A relation is emitted with the IDs of its source and destination nodes.
As such, the complete representation is as follows:
- The relation's internal ID.
- The relationship type ID.
- The source node's internal ID.
- The destination node's internal ID.
- The key-value pairs of all properties the relation possesses.
[
Relation ID (integer),
type ID (integer),
source node ID (integer),
destination node ID (integer),
[[property key ID (integer), ValueType (enum), value (scalar)] X property count]
]
The ValueType
for the third entry is VALUE_STRING
, and the other element in the array is the actual value, "Apple".
Reading statistics
The final top-level member of the GRAPH.QUERY reply is the execution statistics. This element is identical between the compact and standard response formats.
The statistics always include query execution time, while any combination of the other elements may be included depending on how the graph was modified.
- "Labels added: (integer)"
- "Labels removed: (integer)" (since RedisGraph 2.10)
- "Nodes created: (integer)"
- "Nodes deleted: (integer)"
- "Properties set: (integer)"
- "Properties removed: (integer)" (since RedisGraph 2.10)
- "Relationships created: (integer)"
- "Relationships deleted: (integer)"
- "Indices created: (integer)"
- "Indices deleted: (integer)"
- "Query internal execution time: (float) milliseconds"
Procedure Calls
Property keys, node labels, and relationship types are all returned as IDs rather than strings in the compact format. For each of these 3 string-ID mappings, IDs start at 0 and increase monotonically.
As such, the client should store a string array for each of these 3 mappings, and print the appropriate string for the user by checking an array at position ID. If an ID greater than the array length is encountered, the local array should be updated with a procedure call.
These calls are described generally in the Procedures documentation.
To retrieve each full mapping, the appropriate calls are:
db.labels()
127.0.0.1:6379> GRAPH.QUERY demo "CALL db.labels()"
1) 1) "label"
2) 1) 1) "plant"
2) 1) "fruit"
3) 1) "Query internal execution time: 0.321513 milliseconds"
db.relationshipTypes()
127.0.0.1:6379> GRAPH.QUERY demo "CALL db.relationshipTypes()"
1) 1) "relationshipType"
2) 1) 1) "GROWS"
3) 1) "Query internal execution time: 0.429677 milliseconds"
db.propertyKeys()
127.0.0.1:6379> GRAPH.QUERY demo "CALL db.propertyKeys()"
1) 1) "propertyKey"
2) 1) 1) "name"
2) 1) "season"
3) 1) "Query internal execution time: 0.318940 milliseconds"
Because the cached values never become outdated, it is possible to just retrieve new values with slightly more complex constructions:
CALL db.propertyKeys() YIELD propertyKey RETURN propertyKey SKIP [cached_array_length]
Though the property calls are quite efficient regardless of whether this optimization is used.
As an example, the Python client checks its local array of labels to resolve every label ID as seen here.
In the case of an IndexError, it issues a procedure call to fully refresh its label cache as seen here.
Reference clients
All the logic described in this document has been implemented in most of the clients listed in Client Libraries. Among these, node-redis
, redis-py
and jedis
are currently the most sophisticated.