Thursday, 26 March 2026

Understanding the Property Graph Model

Before diving into the mechanics of traversing graphs with Gremlin, it is essential to establish a clear understanding of what a graph is and how data is represented in Apache TinkerPop.

What Is a Graph?

In the context of TinkerPop, a graph is a structured collection of vertices (also called nodes) and edges (also called relationships).

 

·      Vertices represent domain entities or objects. Examples include a person, a location, a product, or a device. Each vertex can have a label to categorize the type of entity it represents (e.g., Person, Bot) and can also hold properties, which are key-value pairs storing metadata about the vertex (e.g., a name, role, or type).

 

·      Edges represent the relationships or connections between vertices. Each edge also has a label that defines the nature of the relationship (e.g., CREATED, FRIENDS_WITH) and may contain properties describing the relationship itself (e.g., created_year, since). Edges are directed, meaning they flow from an outgoing vertex to an incoming vertex, which allows TinkerPop to model asymmetric relationships naturally.

 

This structure is known as the property graph model, and it is highly flexible because it allows the storage of metadata not just on the vertices, but also on the relationships connecting them. This is particularly useful when modeling real-world scenarios where relationships have attributes or context of their own.

 

Creating a Graph in TinkerPop

TinkerPop provides multiple implementations of the property graph. For experimentation or learning purposes, an in-memory graph such as TinkerGraph is ideal. The following example demonstrates how to create vertices and edges using Gremlin:

// Initialize the in-memory TinkerGraph
graph = TinkerGraph.open()
g = graph.traversal()

// Add vertices with meaningful labels and properties
personAlice = g.addV('Person').property('name', 'Alice').property('role', 'Developer').next()
botChat = g.addV('Bot').property('name', 'ChatBot').property('type', 'AI').next()

// Add an edge connecting the vertices with a label and property
g.addE('CREATED').from(personAlice).to(botChat).property('created_year', 2026)

 

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]
gremlin> 
gremlin> personAlice = g.addV('Person').property('name', 'Alice').property('role', 'Developer').next()
==>v[0]
gremlin> botChat = g.addV('Bot').property('name', 'ChatBot').property('type', 'AI').next()
==>v[3]
gremlin> 
gremlin> g.addE('CREATED').from(personAlice).to(botChat).property('created_year', 2026)
==>e[6][0-CREATED->3]

   

In this example:

·      Two vertices are created, one labeled Person representing Alice, and one labeled Bot representing ChatBot.

·      Each vertex contains properties that describe its attributes (name, role, type).

·      An edge labeled CREATED connects Alice to ChatBot. The edge is directional, flowing from Alice to ChatBot, and contains a property created_year indicating when the relationship was established.

 

Visualizing the Graph

If visualized as a diagram, this graph would consist of:

 

·      A Person vertex labeled Alice

·      A Bot vertex labeled ChatBot

·      A directed edge labeled CREATED pointing from Alice to ChatBot

 


   

This simple structure illustrates the essential elements of a property graph: labeled vertices, labeled edges, properties, and directional relationships.

 

Vertex and Edge Identifiers

Most TinkerPop graph implementations do not allow explicit assignment of vertex or edge identifiers. Instead, the system automatically generates unique identifiers for all graph elements. If an identifier is provided during creation, it is generally ignored. This design ensures consistency and uniqueness within the graph.

 

Accessing Vertex and Edge Identifiers in Gremlin

Although most TinkerPop implementations manage identifiers internally, those identifiers are still fully accessible at query time. Being able to inspect vertex and edge IDs is useful for debugging, logging, result correlation, and understanding how a graph evolves during experimentation.

 

Retrieving Vertex Identifiers

Every vertex in a TinkerPop graph has a unique identifier that can be accessed using the id property. For example, after creating vertices in the Gremlin Console:

gremlin> g.V().toList().get(0).id
==>0

gremlin> g.V().toList().get(1).id
==>3

In this interaction:

 

·      g.V() returns a traversal over all vertices in the graph.

·      toList() materializes the traversal results into a list.

·      get(0) and get(1) retrieve individual vertex instances from that list.

·      .id accesses the internally assigned identifier of the vertex.

 

The numeric values (0, 3) are implementation-specific and should not be interpreted as sequential or predictable. Identifier allocation may vary based on the graph implementation, internal storage layout, or prior graph mutations.

 

A more idiomatic Gremlin approach is to retrieve identifiers directly as part of a traversal:

g.V().id()

 

This returns the identifiers of all vertices without materializing the full vertex objects.

gremlin> g.V().id()
==>0
==>3

   

Retrieving Edge Identifiers

Edges also have unique identifiers that can be accessed in the same manner. To retrieve all edge IDs:

 

g.E().id()

gremlin> g.E().id()
==>6

   

To inspect a specific edge and its identifier:

 

g.E().toList().get(0).id

gremlin> g.E().toList().get(0).id
==>6

   

Just like vertices, edge identifiers are generated and managed by the graph system and should be treated as opaque identifiers, not business keys.

 

Graph Traversal

A graph traversal is the primary mechanism through which Gremlin answers questions about graph data. Once Gremlin knows where the graph is stored, it can execute a traversal, a well-defined sequence of steps that operate over the structure of the graph. Each step transforms or filters the current set of elements, gradually refining the result until the desired answer is produced. Traversals describe how to move through the graph, not just what data to retrieve, which is a key distinction from relational query languages.

 

You can think of a traversal as a process flowing through the graph. Gremlin starts from a set of vertices or edges, applies one step at a time, and passes intermediate results forward until the final output is reached.

 

From English Questions to Gremlin Traversals

Gremlin traversals are often easiest to design by first expressing the question in natural language. Consider the question: “What AI software has Alice created?”.

 

This question can be decomposed into a sequence of graph-oriented actions:

 

·      Locate the vertex representing Alice

·      Follow outgoing edges labeled CREATED

·      Reach the vertices representing AI software

·      Extract the name of that software

 

Each of these actions maps directly to a Gremlin step. When these steps are chained together, they form a traversal that precisely describes how Gremlin should navigate the graph to compute the answer.

 

Locating the Starting Vertex

Every traversal begins with a starting point. In this case, the traversal starts by finding the vertex that represents Alice. This is a filtering operation over the set of all vertices in the graph.

 

The has step filters elements based on property key–value pairs:

gremlin> g.V().has('name', 'Alice')
==>v[0]

While this query is valid, it can be made more precise. Property keys such as name may exist on multiple vertex types, and adding a label constraint clarifies intent and improves readability. A more idiomatic version includes the vertex label:

gremlin> g.V().has('Person', 'name', 'Alice')
==>v[0]

At this point, Gremlin is positioned on the vertex representing Alice.

 

Accessing Properties on Vertices and Edges

When Gremlin is positioned on a vertex or an edge, it has access to all properties associated with that element. The properties method allows inspection of those properties:

gremlin> g.V().has('Person', 'name', 'Alice').properties()
==>vp[role->Developer]
==>vp[name->Alice]

   

This highlights an important aspect of Gremlin, traversals are not limited to navigation. They can also introspect elements, extract metadata, and project specific values at any point in the traversal.

 

Traversal Performance and Index Usage

The traversal shown so far conceptually scans all vertices to locate Alice. This is acceptable for small graphs, but it becomes prohibitively expensive in large scale graphs with millions or billions of elements.

 

Apache TinkerPop does not define a universal API for creating or managing indices. Indexing is handled by the underlying graph database implementation. To optimize lookups like this, indices should be created using the graph system’s native APIs.

 

A key point is that the traversal itself does not change. Once indices exist, they are applied transparently at execution time, allowing Gremlin to evaluate the traversal efficiently without altering its structure.

 

Traversing Directed Edges

After locating Alice, the traversal continues by following relationships from that vertex. In this example, Alice has created software, represented by edges labeled CREATED.

 

Edges in a property graph are directional. Gremlin must be explicitly instructed which direction to traverse. Since Alice is the creator, the traversal follows outgoing CREATED edges using outE:

gremlin> g.V().has('Person', 'name', 'Alice').outE('CREATED')
==>e[6][0-CREATED->3]

   

At this stage, Gremlin has moved from the Alice vertex to the CREATED edge. Traversals frequently alternate between vertices and edges, and understanding this alternation is essential when composing more advanced queries.

 

Moving from Edges to Vertices

An edge connects two vertices: an outgoing vertex and an incoming vertex. To continue the traversal from the edge to the software vertex, the inV step is used:

 

gremlin> g.V().has('Person', 'name', 'Alice').outE('CREATED').inV()
==>v[3]

Gremlin is now positioned on the vertex representing the software that Alice created.

 

Projecting Results from the Traversal

Once the traversal reaches the desired vertices, their properties can be projected. To retrieve only the software name, the values method.

g.V().has('Person', 'name', 'Alice').outE('CREATED').inV().values('name')

   

In situations where more context is required, all properties along with the vertex identifier and label can be retrieved using valueMap with the true flag:

 

gremlin> g.V().has('Person', 'name', 'Alice').outE('CREATED').inV().valueMap(true)
==>[id:3,label:Bot,name:[ChatBot],type:[AI]]

   

Using out() as a Traversal Shortcut

The traversal shown earlier explicitly moves from a vertex to an edge and then from that edge to the adjacent vertex:

 

g.V().has('Person', 'name', 'Alice')
.outE('CREATED')
.inV()
.valueMap(true)

   

This form is deliberately explicit. It makes the traversal mechanics visible by showing the transition from a vertex to an edge (outE) and then from the edge to the incoming vertex (inV). This style is especially useful when learning Gremlin or when edge-level details such as edge properties, identifiers, or labels are required as part of the traversal.

 

However, Gremlin also provides higher-level traversal steps that combine these low-level movements into a single operation. In this case, the pair outE('CREATED').inV() can be replaced with the out('CREATED') step.

 

The equivalent traversal becomes:

 

g.V().has('Person', 'name', 'Alice')
.out('CREATED')
.valueMap(true)

   

What out() Really Does?

The out() step is a convenience step. Internally, it performs two actions in sequence:

 

Traverse outgoing edges with the specified label

Move to the vertex at the incoming end of those edges

 

In other words, out('CREATED') is functionally equivalent to:

 

 

outE('CREATED').inV()

   

Gremlin provides this abstraction because many traversals are primarily interested in vertices rather than edges. By collapsing the edge transition into a single step, out() makes the traversal shorter, more readable, and closer to the way graph relationships are typically described conceptually.

 

Mastery of Gremlin comes from understanding this flow. Once the mechanics of vertices, edges, direction, and step composition are clear, more complex traversals such as multi-hop paths, conditional logic, and aggregations can be expressed naturally and precisely.

  

Previous                                                    Next                                                    Home


No comments:

Post a Comment