When working with graph databases, writing a traversal that returns the correct result is only half the problem. The other half is understanding how the traversal actually moved through the graph. As traversals grow in complexity, crossing multiple vertices, edges, filters, and transformations—it becomes increasingly difficult to reason about what was visited, in what order, and why a particular result was produced.
Apache TinkerPop’s Gremlin addresses this challenge through the path() step. The path() step allows us to capture the entire traversal history for each result, including vertices, edges, and even intermediate values encountered along the way. Instead of seeing only the final destination, path() shows the journey taken by the traversal.
This capability is especially valuable when:
· Debugging complex traversals
· Explaining query behavior to others
· Building data lineage or audit-style queries
· Producing human-readable or structured traversal outputs
In this post, we’ll explore how path() works in Gremlin, how it records traversal steps, and how modulators like by() can be used to control the shape and readability of path results. We’ll also look at more advanced patterns, including round-robin formatting, anonymous traversals inside by(), and combining computed values into traversal paths.
By the end, you should be able to confidently use path() not just to inspect traversals, but to design explainable, debuggable, and lineage-friendly Gremlin queries.
1. Domain model
Let's use a People → WorksOn → Project domain to demo how path() works.
Vertices:
· person: name, role
· project: name, status
Edges:
· worksOn: since (year), allocation (%)
(Person) -[worksOn]-> (Project)
1.1 Create Graph
graph = TinkerGraph.open() g = graph.traversal() // People g.addV('person').property('name','Alice').property('role','Engineer') g.addV('person').property('name','Bob').property('role','Architect') // Projects g.addV('project').property('name','Apollo').property('status','Active') g.addV('project').property('name','Hermes').property('status','Completed') // Alice works on projects g.V().has('person','name','Alice').as('a'). V().has('project','name','Apollo'). addE('worksOn').from('a'). property('since',2022).property('allocation',80) g.V().has('person','name','Alice').as('a'). V().has('project','name','Hermes'). addE('worksOn').from('a'). property('since',2021).property('allocation',40) // Bob works on Apollo g.V().has('person','name','Bob').as('b'). V().has('project','name','Apollo'). addE('worksOn').from('b'). property('since',2020).property('allocation',60)
gremlin> g.V().valueMap(true) ==>[id:0,label:person,role:[Engineer],name:[Alice]] ==>[id:3,label:person,role:[Architect],name:[Bob]] ==>[id:6,label:project,name:[Apollo],status:[Active]] ==>[id:9,label:project,name:[Hermes],status:[Completed]] gremlin> gremlin> g.E().valueMap(true) ==>[id:12,label:worksOn,allocation:80,since:2022] ==>[id:13,label:worksOn,allocation:40,since:2021] ==>[id:14,label:worksOn,allocation:60,since:2020]
1.2 Understanding path() in Gremlin
In Gremlin, traversals are executed step by step as the query moves through vertices, edges, and sometimes intermediate values. While most traversals return only the final element, there are many situations where we need to understand how the traversal reached that result.
This is where the path() step becomes useful.
The path() step captures the complete traversal history for each result. It records every object that the traversal touched in the exact order in which it was visited. These objects can be vertices, edges, or even scalar values produced by steps such as values() or count().
Instead of answering “What did I reach?”, path() answers “What route did I take to get there?”.
Example
g.V().has('person','name','Alice'). outE('worksOn'). inV(). path()
Here,
· We start at the person vertex named Alice
· Traverses outgoing worksOn edges
· Moves to the connected project vertices
· Records every step using path()
gremlin> g.V().has('person','name','Alice'). ......1> outE('worksOn'). ......2> inV(). ......3> path() ==>[v[0],e[12][0-worksOn->6],v[6]] ==>[v[0],e[13][0-worksOn->9],v[9]]
Here,
Each result is a list representing one complete traversal path.
· The first element is the starting vertex (Alice)
· The second element is the edge traversed (worksOn)
· The third element is the destination vertex (Project)
While path() gives complete visibility into traversal behavior, the raw output is usually hard to read, we can overcome this limitation using by() modulator
Understanding the by() Modulator in Gremlin
The path() step gives us visibility into where a traversal has been, but on its own it often produces results that are difficult to read or consume. The by() modulator exists to solve this exact problem.
In Gremlin, a modulator is not a standalone step. Instead, it modifies the behavior of another step. The by() modulator is most commonly used with steps such as path(), group(), order(), and select().
When used with path(), by() controls how each element in the path is projected into the final result.
What does by() actually do?
Conceptually, you can think of by() as a projection function applied to each element in the path.
Instead of returning the full vertex, the full edge or an internal ID by() allows you to say, For this element, return this specific representation instead.
path() without by()
g.V().has('person','name','Alice'). outE('worksOn'). inV(). path()
gremlin> g.V().has('person','name','Alice'). ......1> outE('worksOn'). ......2> inV(). ......3> path() ==>[v[0],e[12][0-worksOn->6],v[6]] ==>[v[0],e[13][0-worksOn->9],v[9]]
This output is correct, but:
· It exposes internal graph details
· It is not human-friendly
· It is difficult to serialize cleanly
Adding a simple by() modulator
g.V().has('person','name','Alice'). outE('worksOn'). inV(). path(). by(label())
Above statement add the label to each path step.
gremlin> g.V().has('person','name','Alice'). ......1> outE('worksOn'). ......2> inV(). ......3> path(). ......4> by(label()) ==>[person,worksOn,project] ==>[person,worksOn,project]
Suppose I want name to displayed for first vetext, and label for the edge and name again for the vertex in the path.
g.V().has('person','name','Alice'). outE('worksOn'). inV(). path(). by('name'). by(label()). by('name')
gremlin> g.V().has('person','name','Alice'). ......1> outE('worksOn'). ......2> inV(). ......3> path(). ......4> by('name'). ......5> by(label()). ......6> by('name') ==>[Alice,worksOn,Apollo] ==>[Alice,worksOn,Hermes]
You can even supply the labels used for path in one shot using values method.
g.V().has('person','name','Alice'). outE('worksOn'). inV(). path(). by(values('name', label(), 'name').fold())
Round-robin behavior of by()
When the number of by() modulators is less than the number of elements in the path, Gremlin applies them in a round-robin fashion.
For example, I have path elements like below:
[vertex, edge, vertex]
by('name').by('label()')
Applied as:
· vertex -> by('name')
· edge -> by('label()')
· vertex -> by('name')
This behavior allows concise queries without repetition, as long as elements share the same projection logic.
g.V().has('person','name','Alice'). outE('worksOn'). inV(). path(). by('name'). by(label())
gremlin> g.V().has('person','name','Alice'). ......1> outE('worksOn'). ......2> inV(). ......3> path(). ......4> by('name'). ......5> by(label()) ==>[Alice,worksOn,Apollo] ==>[Alice,worksOn,Hermes]
by() with no argument
Not all path elements are vertices or edges, sometimes you will get a scalar value as well in the path.
g.V().has('person','name','Alice'). out('worksOn'). values('status'). path()
gremlin> g.V().has('person','name','Alice'). ......1> out('worksOn'). ......2> values('status'). ......3> path() ==>[v[0],v[6],Active] ==>[v[0],v[9],Completed]
Here the last values 'Active', 'Status' are scalar values.
g.V().has('person','name','Alice'). out('worksOn'). values('status'). path(). by('name'). by('name'). by()
The last by() step is used to print the scalar value.
gremlin> g.V().has('person','name','Alice'). ......1> out('worksOn'). ......2> values('status'). ......3> path(). ......4> by('name'). ......5> by('name'). ......6> by() ==>[Alice,Apollo,Active] ==>[Alice,Hermes,Completed]
Why is by() empty here?
Here the last value in the path is a scalat, here is no property to extract, so by() tells Gremlin to return the value as is.
Using computations inside by()
by() is not limited to property access.
Example
path(). by(out('worksOn').count())
This means, for each element in the path, run this traversal (out('worksOn').count()) and return the result.
g.V(). hasLabel('person'). path(). by(out('worksOn').count())
gremlin> g.V(). ......1> hasLabel('person'). ......2> path(). ......3> by(out('worksOn').count()) ==>[2] ==>[1]
In summary,
· by() is a projection modulator, not a traversal step
· It formats each element in a path()
· Modulators are applied in round-robin order
· by() can accept Property keys, Empty projection and any computation can live inside by()
Previous Next Home
No comments:
Post a Comment