Before diving into the mechanics of traversing graphs with Gremlin, it is essential to establish a clear understanding of what a graph is and how data is represented in Apache TinkerPop.
What Is a Graph?
In the context of TinkerPop, a graph is a structured collection of vertices (also called nodes) and edges (also called relationships).
· Vertices represent domain entities or objects. Examples include a person, a location, a product, or a device. Each vertex can have a label to categorize the type of entity it represents (e.g., Person, Bot) and can also hold properties, which are key-value pairs storing metadata about the vertex (e.g., a name, role, or type).
· Edges represent the relationships or connections between vertices. Each edge also has a label that defines the nature of the relationship (e.g., CREATED, FRIENDS_WITH) and may contain properties describing the relationship itself (e.g., created_year, since). Edges are directed, meaning they flow from an outgoing vertex to an incoming vertex, which allows TinkerPop to model asymmetric relationships naturally.
This structure is known as the property graph model, and it is highly flexible because it allows the storage of metadata not just on the vertices, but also on the relationships connecting them. This is particularly useful when modeling real-world scenarios where relationships have attributes or context of their own.
Creating a Graph in TinkerPop
TinkerPop provides multiple implementations of the property graph. For experimentation or learning purposes, an in-memory graph such as TinkerGraph is ideal. The following example demonstrates how to create vertices and edges using Gremlin:
// Initialize the in-memory TinkerGraph graph = TinkerGraph.open() g = graph.traversal() // Add vertices with meaningful labels and properties personAlice = g.addV('Person').property('name', 'Alice').property('role', 'Developer').next() botChat = g.addV('Bot').property('name', 'ChatBot').property('type', 'AI').next() // Add an edge connecting the vertices with a label and property g.addE('CREATED').from(personAlice).to(botChat).property('created_year', 2026)
gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0] gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard] gremlin> gremlin> personAlice = g.addV('Person').property('name', 'Alice').property('role', 'Developer').next() ==>v[0] gremlin> botChat = g.addV('Bot').property('name', 'ChatBot').property('type', 'AI').next() ==>v[3] gremlin> gremlin> g.addE('CREATED').from(personAlice).to(botChat).property('created_year', 2026) ==>e[6][0-CREATED->3]
In this example:
· Two vertices are created, one labeled Person representing Alice, and one labeled Bot representing ChatBot.
· Each vertex contains properties that describe its attributes (name, role, type).
· An edge labeled CREATED connects Alice to ChatBot. The edge is directional, flowing from Alice to ChatBot, and contains a property created_year indicating when the relationship was established.
Visualizing the Graph
If visualized as a diagram, this graph would consist of:
· A Person vertex labeled Alice
· A Bot vertex labeled ChatBot
· A directed edge labeled CREATED pointing from Alice to ChatBot
This simple structure illustrates the essential elements of a property graph: labeled vertices, labeled edges, properties, and directional relationships.
Vertex and Edge Identifiers
Most TinkerPop graph implementations do not allow explicit assignment of vertex or edge identifiers. Instead, the system automatically generates unique identifiers for all graph elements. If an identifier is provided during creation, it is generally ignored. This design ensures consistency and uniqueness within the graph.
Accessing Vertex and Edge Identifiers in Gremlin
Although most TinkerPop implementations manage identifiers internally, those identifiers are still fully accessible at query time. Being able to inspect vertex and edge IDs is useful for debugging, logging, result correlation, and understanding how a graph evolves during experimentation.
Retrieving Vertex Identifiers
Every vertex in a TinkerPop graph has a unique identifier that can be accessed using the id property. For example, after creating vertices in the Gremlin Console:
gremlin> g.V().toList().get(0).id ==>0 gremlin> g.V().toList().get(1).id ==>3
In this interaction:
· g.V() returns a traversal over all vertices in the graph.
· toList() materializes the traversal results into a list.
· get(0) and get(1) retrieve individual vertex instances from that list.
· .id accesses the internally assigned identifier of the vertex.
The numeric values (0, 3) are implementation-specific and should not be interpreted as sequential or predictable. Identifier allocation may vary based on the graph implementation, internal storage layout, or prior graph mutations.
A more idiomatic Gremlin approach is to retrieve identifiers directly as part of a traversal:
g.V().id()
This returns the identifiers of all vertices without materializing the full vertex objects.
gremlin> g.V().id() ==>0 ==>3
Retrieving Edge Identifiers
Edges also have unique identifiers that can be accessed in the same manner. To retrieve all edge IDs:
g.E().id()
gremlin> g.E().id() ==>6
To inspect a specific edge and its identifier:
g.E().toList().get(0).id
gremlin> g.E().toList().get(0).id ==>6
Just like vertices, edge identifiers are generated and managed by the graph system and should be treated as opaque identifiers, not business keys.
Graph Traversal
A graph traversal is the primary mechanism through which Gremlin answers questions about graph data. Once Gremlin knows where the graph is stored, it can execute a traversal, a well-defined sequence of steps that operate over the structure of the graph. Each step transforms or filters the current set of elements, gradually refining the result until the desired answer is produced. Traversals describe how to move through the graph, not just what data to retrieve, which is a key distinction from relational query languages.
You can think of a traversal as a process flowing through the graph. Gremlin starts from a set of vertices or edges, applies one step at a time, and passes intermediate results forward until the final output is reached.
From English Questions to Gremlin Traversals
Gremlin traversals are often easiest to design by first expressing the question in natural language. Consider the question: “What AI software has Alice created?”.
This question can be decomposed into a sequence of graph-oriented actions:
· Locate the vertex representing Alice
· Follow outgoing edges labeled CREATED
· Reach the vertices representing AI software
· Extract the name of that software
Each of these actions maps directly to a Gremlin step. When these steps are chained together, they form a traversal that precisely describes how Gremlin should navigate the graph to compute the answer.
Locating the Starting Vertex
Every traversal begins with a starting point. In this case, the traversal starts by finding the vertex that represents Alice. This is a filtering operation over the set of all vertices in the graph.
The has step filters elements based on property key–value pairs:
gremlin> g.V().has('name', 'Alice') ==>v[0]
While this query is valid, it can be made more precise. Property keys such as name may exist on multiple vertex types, and adding a label constraint clarifies intent and improves readability. A more idiomatic version includes the vertex label:
gremlin> g.V().has('Person', 'name', 'Alice') ==>v[0]
At this point, Gremlin is positioned on the vertex representing Alice.
Accessing Properties on Vertices and Edges
When Gremlin is positioned on a vertex or an edge, it has access to all properties associated with that element. The properties method allows inspection of those properties:
gremlin> g.V().has('Person', 'name', 'Alice').properties() ==>vp[role->Developer] ==>vp[name->Alice]
This highlights an important aspect of Gremlin, traversals are not limited to navigation. They can also introspect elements, extract metadata, and project specific values at any point in the traversal.
Traversal Performance and Index Usage
The traversal shown so far conceptually scans all vertices to locate Alice. This is acceptable for small graphs, but it becomes prohibitively expensive in large scale graphs with millions or billions of elements.
Apache TinkerPop does not define a universal API for creating or managing indices. Indexing is handled by the underlying graph database implementation. To optimize lookups like this, indices should be created using the graph system’s native APIs.
A key point is that the traversal itself does not change. Once indices exist, they are applied transparently at execution time, allowing Gremlin to evaluate the traversal efficiently without altering its structure.
Traversing Directed Edges
After locating Alice, the traversal continues by following relationships from that vertex. In this example, Alice has created software, represented by edges labeled CREATED.
Edges in a property graph are directional. Gremlin must be explicitly instructed which direction to traverse. Since Alice is the creator, the traversal follows outgoing CREATED edges using outE:
gremlin> g.V().has('Person', 'name', 'Alice').outE('CREATED') ==>e[6][0-CREATED->3]
At this stage, Gremlin has moved from the Alice vertex to the CREATED edge. Traversals frequently alternate between vertices and edges, and understanding this alternation is essential when composing more advanced queries.
Moving from Edges to Vertices
An edge connects two vertices: an outgoing vertex and an incoming vertex. To continue the traversal from the edge to the software vertex, the inV step is used:
gremlin> g.V().has('Person', 'name', 'Alice').outE('CREATED').inV() ==>v[3]
Gremlin is now positioned on the vertex representing the software that Alice created.
Projecting Results from the Traversal
Once the traversal reaches the desired vertices, their properties can be projected. To retrieve only the software name, the values method.
g.V().has('Person', 'name', 'Alice').outE('CREATED').inV().values('name')
In situations where more context is required, all properties along with the vertex identifier and label can be retrieved using valueMap with the true flag:
gremlin> g.V().has('Person', 'name', 'Alice').outE('CREATED').inV().valueMap(true) ==>[id:3,label:Bot,name:[ChatBot],type:[AI]]
Using out() as a Traversal Shortcut
The traversal shown earlier explicitly moves from a vertex to an edge and then from that edge to the adjacent vertex:
g.V().has('Person', 'name', 'Alice') .outE('CREATED') .inV() .valueMap(true)
This form is deliberately explicit. It makes the traversal mechanics visible by showing the transition from a vertex to an edge (outE) and then from the edge to the incoming vertex (inV). This style is especially useful when learning Gremlin or when edge-level details such as edge properties, identifiers, or labels are required as part of the traversal.
However, Gremlin also provides higher-level traversal steps that combine these low-level movements into a single operation. In this case, the pair outE('CREATED').inV() can be replaced with the out('CREATED') step.
The equivalent traversal becomes:
g.V().has('Person', 'name', 'Alice') .out('CREATED') .valueMap(true)
What out() Really Does?
The out() step is a convenience step. Internally, it performs two actions in sequence:
Traverse outgoing edges with the specified label
Move to the vertex at the incoming end of those edges
In other words, out('CREATED') is functionally equivalent to:
outE('CREATED').inV()
Gremlin provides this abstraction because many traversals are primarily interested in vertices rather than edges. By collapsing the edge transition into a single step, out() makes the traversal shorter, more readable, and closer to the way graph relationships are typically described conceptually.
Mastery of Gremlin comes from understanding this flow. Once the mechanics of vertices, edges, direction, and step composition are clear, more complex traversals such as multi-hop paths, conditional logic, and aggregations can be expressed naturally and precisely.
Previous Next Home

No comments:
Post a Comment