When working with Gremlin, one of the most subtle yet powerful steps is local().
At first glance, many aggregations like count(), mean(), or fold() seem straightforward. But without understanding how traversals operate on streams of traversers, it’s easy to get completely unexpected results.
The core confusion usually comes down following questions:
· Are we calculating something across the entire traversal stream?
· Or are we calculating something per vertex and then aggregating?
Think of it like calculating student average marks:
· Without local(): you collect all marks from all students into one big pile and then calculate.
· With local(): you calculate each student’s total individually, then compute the class average correctly.
In Gremlin, many steps operate globally by default. When we wrap a traversal inside local(), we tell Gremlin, execute this sub-traversal independently for each incoming element. This small change can completely alter the semantics of a query, especially when working with statistical operations like mean(), sum(), or count().
Let’s create simple student Graph to demo the local step.
Step 1: Create Graph Traversal instance.
graph = TinkerGraph.open() g = graph.traversal()
Step 2: Create Student vertices.
g.addV('student').property('name','Alice') g.addV('student').property('name','Bob') g.addV('student').property('name','Carol')
Step 3: Create Subjects vertices.
g.addV('subject').property('name','Math') g.addV('subject').property('name','Physics') g.addV('subject').property('name','Chemistry')
Step 4: Connect Students to Subjects with Marks
// Alice: 80, 90, 70 → Total = 240 g.V().has('student','name','Alice').as('s'). V().has('subject','name','Math').addE('scored').from('s').property('marks',80) g.V().has('student','name','Alice').as('s'). V().has('subject','name','Physics').addE('scored').from('s').property('marks',90) g.V().has('student','name','Alice').as('s'). V().has('subject','name','Chemistry').addE('scored').from('s').property('marks',70) // Bob: 60, 75, 85 → Total = 220 g.V().has('student','name','Bob').as('s'). V().has('subject','name','Math').addE('scored').from('s').property('marks',60) g.V().has('student','name','Bob').as('s'). V().has('subject','name','Physics').addE('scored').from('s').property('marks',75) g.V().has('student','name','Bob').as('s'). V().has('subject','name','Chemistry').addE('scored').from('s').property('marks',85) // Carol: 95, 85, 90 → Total = 270 g.V().has('student','name','Carol').as('s'). V().has('subject','name','Math').addE('scored').from('s').property('marks',95) g.V().has('student','name','Carol').as('s'). V().has('subject','name','Physics').addE('scored').from('s').property('marks',85) g.V().has('student','name','Carol').as('s'). V().has('subject','name','Chemistry').addE('scored').from('s').property('marks',90)
Example: Calculate Average Total Marks Per Student
Alice = 240
Bob = 220
Carol = 270
Average = (240 + 220 + 270) / 3
= 243.33
Without local() (WRONG)
g.V().hasLabel('student'). outE('scored'). values('marks'). sum(). mean()
gremlin> g.V().hasLabel('student'). ......1> outE('scored'). ......2> values('marks'). ......3> sum(). ......4> mean() ==>730.0
Here:
· All marks are collected together
· sum() produces one number
· mean() divides by 1
Another common mistake
g.V().hasLabel('student'). outE('scored'). values('marks'). mean()
gremlin> g.V().hasLabel('student'). ......1> outE('scored'). ......2> values('marks'). ......3> mean() ==>81.11111111111111
This gives average of all subject marks, and it is NOT average per student.
Correct Way Using local()
g.V().hasLabel('student'). local( outE('scored'). values('marks'). sum() ). mean()
gremlin> g.V().hasLabel('student'). ......1> local( ......2> outE('scored'). ......3> values('marks'). ......4> sum() ......5> ). ......6> mean() ==>243.33333333333334
What Happens Now?
For each student, local step will calculate the sum.
· Alice → sum = 240
· Bob → sum = 220
· Carol → sum = 270
Collection becomes: [240, 220, 270]
gremlin> g.V().hasLabel('student'). ......1> local( ......2> outE('scored'). ......3> values('marks'). ......4> sum() ......5> ) ==>240 ==>220 ==>270
Then: mean() → 243.33
Here is how you can see total and average marks of individuals.
g.V().hasLabel('student'). project('name','totalMarks', 'averageMarks'). by('name'). by(local( outE('scored'). values('marks'). sum() )). by(local( outE('scored'). values('marks'). mean() ))
gremlin> g.V().hasLabel('student'). ......1> project('name','totalMarks', 'averageMarks'). ......2> by('name'). ......3> by(local( ......4> outE('scored'). ......5> values('marks'). ......6> sum() ......7> )). ......8> by(local( ......9> outE('scored'). .....10> values('marks'). .....11> mean() .....12> )) ==>[name:Alice,totalMarks:240,averageMarks:80.0] ==>[name:Bob,totalMarks:220,averageMarks:73.33333333333333] ==>[name:Carol,totalMarks:270,averageMarks:90.0]
The local() step is all about scope. By default, many Gremlin steps operate on the entire stream of traversers. That means aggregations like count(), sum(), or mean() may unintentionally compute results globally.
local() changes that behavior. It tells Gremlin, execute this sub-traversal independently for each incoming element.
Previous Next Home
No comments:
Post a Comment