In Apache TinkerPop Gremlin, the as() step is commonly used to label intermediate elements in a traversal so they can be referenced later using the select() step. In most cases, each label is assumed to be unique within a traversal, making it easy to reason about what value select() will return.
However, Gremlin does not enforce label uniqueness. The same label can be reused multiple times within a single traversal, either intentionally or accidentally. When this happens, Gremlin internally maintains multiple bindings for that label, preserving them in the order they are encountered during traversal execution.
By default, when select() is used with a label that has been assigned multiple times, Gremlin returns only the most recently bound value, which can lead to unexpected results if this behavior is not well understood.
To provide explicit control over which bound value is returned, Gremlin offers three special selector keywords:
· first: returns the earliest value bound to the label
· last: returns the most recent value bound to the label (default behavior)
· all: returns all values bound to the label as a list
This post explores how these keywords work in practice, how Gremlin stores repeated label bindings internally, and how they interact with steps such as by(), unfold(), and values(). Through practical examples, you’ll learn how to deliberately reuse labels in traversals, avoid common pitfalls, and confidently extract the exact data you intend when working with complex Gremlin queries.
1. Modelling the Graph
Let's model and Organization structure to demonstrate these examples.
Engineer → Manager → Director → SeniorDirector → VicePresident
In this model, each employee will be a vertex with
· label: employee
· name
· role
Each reporting line will be an edge with label "reportsTo"
Step 1: Create VicePresident vertices.
g.addV('employee'). property('name','Alice'). property('role','VicePresident') g.addV('employee'). property('name','Bob'). property('role','VicePresident')
Step 2: Create Senior Directors (Reporting to VPs)
We’ll add 3 Senior Directors, split across the two VPs.
g.addV('employee'). property('name','Carol'). property('role','SeniorDirector') g.addV('employee'). property('name','David'). property('role','SeniorDirector') g.addV('employee'). property('name','Eve'). property('role','SeniorDirector') Now connect them. g.V().has('employee','name','Carol'). addE('reportsTo'). to(__.V().has('employee','name','Alice')) g.V().has('employee','name','David'). addE('reportsTo'). to(__.V().has('employee','name','Alice')) g.V().has('employee','name','Eve'). addE('reportsTo'). to(__.V().has('employee','name','Bob'))
Step 3: Create Directors (More Density)
Let’s add 5 Directors.
['Frank','Grace','Heidi','Ivan','Judy'].each { name -> g.addV('employee'). property('name', name). property('role','Director'). iterate() }
Connect them to Senior Directors.
g.V().has('name','Frank').addE('reportsTo').to(__.V().has('name','Carol')) g.V().has('name','Grace').addE('reportsTo').to(__.V().has('name','Carol')) g.V().has('name','Heidi').addE('reportsTo').to(__.V().has('name','David')) g.V().has('name','Ivan').addE('reportsTo').to(__.V().has('name','David')) g.V().has('name','Judy').addE('reportsTo').to(__.V().has('name','Eve'))
Step 4: Create Managers (Broader Middle Layer)
Add 6 Managers.
['Ken','Laura','Mallory','Niaj','Olivia','Peggy'].each { name -> g.addV('employee'). property('name', name). property('role','Manager'). iterate() }
Connect them to Directors.
g.V().has('name','Ken').addE('reportsTo').to(__.V().has('name','Frank')) g.V().has('name','Laura').addE('reportsTo').to(__.V().has('name','Frank')) g.V().has('name','Mallory').addE('reportsTo').to(__.V().has('name','Grace')) g.V().has('name','Niaj').addE('reportsTo').to(__.V().has('name','Heidi')) g.V().has('name','Olivia').addE('reportsTo').to(__.V().has('name','Ivan')) g.V().has('name','Peggy').addE('reportsTo').to(__.V().has('name','Judy'))
Step 5: Create Engineers (Largest Layer)
Now add 10 Engineers.
['Quinn','Ruth','Sybil','Trent','Uma', 'Victor','Wendy','Xavier','Yvonne','Zack'].each { name -> g.addV('employee'). property('name', name). property('role','Engineer'). iterate() }
Connect them to managers.
g.V().has('name','Quinn').addE('reportsTo').to(__.V().has('name','Ken')) g.V().has('name','Ruth').addE('reportsTo').to(__.V().has('name','Ken')) g.V().has('name','Sybil').addE('reportsTo').to(__.V().has('name','Laura')) g.V().has('name','Trent').addE('reportsTo').to(__.V().has('name','Mallory')) g.V().has('name','Uma').addE('reportsTo').to(__.V().has('name','Mallory')) g.V().has('name','Victor').addE('reportsTo').to(__.V().has('name','Niaj')) g.V().has('name','Wendy').addE('reportsTo').to(__.V().has('name','Niaj')) g.V().has('name','Xavier').addE('reportsTo').to(__.V().has('name','Olivia')) g.V().has('name','Yvonne').addE('reportsTo').to(__.V().has('name','Peggy')) g.V().has('name','Zack').addE('reportsTo').to(__.V().has('name','Peggy'))
Example 1: Reusing the Same Label Across Levels (Baseline)
Let’s start from an Engineer and walk up the org chain, reusing the same label emp at every step.
g.V().has('employee','name','Quinn'). as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). select('emp'). by('name')
gremlin> g.V().has('employee','name','Quinn'). ......1> as('emp'). ......2> out('reportsTo').as('emp'). ......3> out('reportsTo').as('emp'). ......4> out('reportsTo').as('emp'). ......5> out('reportsTo').as('emp'). ......6> select('emp'). ......7> by('name') ==>Alice
Here 'emp' is bound five times (Engineer → Manager → Director → SeniorDirector → VicePresident), select('emp') returns only the last binding. This is Gremlin’s default behavior (Return the last bounded value).
Using last Explicitly (Clarity Over Assumption)
g.V().has('employee','name','Quinn'). as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). select(last, 'emp'). by('name')
gremlin> g.V().has('employee','name','Quinn'). ......1> as('emp'). ......2> out('reportsTo').as('emp'). ......3> out('reportsTo').as('emp'). ......4> out('reportsTo').as('emp'). ......5> out('reportsTo').as('emp'). ......6> select(last, 'emp'). ......7> by('name') ==>Alice
Example 2: Using first to Get the Starting Point
g.V().has('employee','name','Quinn'). as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). select(first,'emp'). by('name')
gremlin> g.V().has('employee','name','Quinn'). ......1> as('emp'). ......2> out('reportsTo').as('emp'). ......3> out('reportsTo').as('emp'). ......4> out('reportsTo').as('emp'). ......5> out('reportsTo').as('emp'). ......6> select(first,'emp'). ......7> by('name') ==>Quinn
'first' keyword always returns the earliest binding of the label.
Example 3: Using all to Capture the Entire Reporting Chain
g.V().has('employee','name','Quinn'). as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). select(all,'emp')
Above one return list of values that are bound to the label 'emp'.
gremlin> g.V().has('employee','name','Quinn'). ......1> as('emp'). ......2> out('reportsTo').as('emp'). ......3> out('reportsTo').as('emp'). ......4> out('reportsTo').as('emp'). ......5> out('reportsTo').as('emp'). ......6> select(all,'emp') ==>[v[62],v[38],v[18],v[6],v[0]]
Let's extract the 'name' property from these vertices.
g.V().has('employee','name','Quinn'). as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). select(all,'emp'). unfold(). values('name'). fold()
Here:
select(all,'emp'): At this point, Gremlin has reused the same label emp multiple times, so the result is a List of vertices for each traversal. At this step we have result like below.
[ v[Quinn], v[Ken], v[Frank], v[Carol], v[Alice] ]
unfold(): takes a collection (List, Set, Map values) and emits one traverser per element.
After unfold():
Traverser 1 → v[Quinn] Traverser 2 → v[Ken] Traverser 3 → v[Frank] Traverser 4 → v[Carol] Traverser 5 → v[Alice]
values('name'): Extract Property from Each Vertex, now each traverser contains a single vertex, so values('name') is valid.
Traverser 1 → "Quinn" Traverser 2 → "Ken" Traverser 3 → "Frank" Traverser 4 → "Carol" Traverser 5 → "Alice"
fold(): Reassemble Results into a Single List
[ "Quinn", "Ken", "Frank", "Carol", "Alice" ]
gremlin> g.V().has('employee','name','Quinn'). ......1> as('emp'). ......2> out('reportsTo').as('emp'). ......3> out('reportsTo').as('emp'). ......4> out('reportsTo').as('emp'). ......5> out('reportsTo').as('emp'). ......6> select(all,'emp'). ......7> unfold(). ......8> values('name'). ......9> fold() ==>[Quinn,Ken,Frank,Carol,Alice]
Example 4: Extracting Both Role and Name (Structured Output)
g.V().has('employee','name','Quinn'). as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). out('reportsTo').as('emp'). select(all,'emp'). unfold(). project('name','role'). by('name'). by('role'). fold()
gremlin> g.V().has('employee','name','Quinn'). ......1> as('emp'). ......2> out('reportsTo').as('emp'). ......3> out('reportsTo').as('emp'). ......4> out('reportsTo').as('emp'). ......5> out('reportsTo').as('emp'). ......6> select(all,'emp'). ......7> unfold(). ......8> project('name','role'). ......9> by('name'). .....10> by('role'). .....11> fold() ==>[[name:Quinn,role:Engineer],[name:Ken,role:Manager],[name:Frank,role:Director],[name:Carol,role:SeniorDirector],[name:Alice,role:VicePresident]]
In this post, we explored how Gremlin behaves when the same label is assigned multiple times using the as() step within a single traversal. While labels are often treated as unique identifiers, Gremlin allows them to be reused freely resulting in multiple bindings for the same label as the traversal progresses.
By default, when a label is selected using select('label'), Gremlin returns only the most recent binding, which is equivalent to using the last keyword. This implicit behavior can be surprising unless it is clearly understood and made explicit in the traversal.
To address this, Gremlin provides three selector keywords that give precise control over label resolution:
· first: returns the earliest value bound to the label
· last: returns the most recent value (the default behavior)
· all: returns every value bound to the label, in traversal order
When a label is reused, Gremlin internally stores its values as a list, which has important implications for how the result can be processed. In particular, the by() modulator cannot be applied directly when using select(all, 'label'), because by() operates on individual elements, not collections. To work with these results effectively, the traversal must first flatten the list using unfold(), after which steps such as values(), project(), or additional transformations can be applied.
No comments:
Post a Comment