Friday, 5 June 2026

Gremlin local() Explained: Global vs Per-Element Calculations

  

When working with Gremlin, one of the most subtle yet powerful steps is local().

 

At first glance, many aggregations like count(), mean(), or fold() seem straightforward. But without understanding how traversals operate on streams of traversers, it’s easy to get completely unexpected results.

 

The core confusion usually comes down following questions:

·      Are we calculating something across the entire traversal stream?

·      Or are we calculating something per vertex and then aggregating?

 

Think of it like calculating student average marks:

 

·      Without local(): you collect all marks from all students into one big pile and then calculate.

·      With local(): you calculate each student’s total individually, then compute the class average correctly.

 

In Gremlin, many steps operate globally by default. When we wrap a traversal inside local(), we tell Gremlin, execute this sub-traversal independently for each incoming element. This small change can completely alter the semantics of a query, especially when working with statistical operations like mean(), sum(), or count().

 

Let’s create simple student Graph to demo the local step.

 

Step 1: Create Graph Traversal instance.

graph = TinkerGraph.open()
g = graph.traversal()

Step 2: Create Student vertices.

g.addV('student').property('name','Alice')
g.addV('student').property('name','Bob')
g.addV('student').property('name','Carol')

   

Step 3: Create Subjects vertices.

 

g.addV('subject').property('name','Math')
g.addV('subject').property('name','Physics')
g.addV('subject').property('name','Chemistry')

   

Step 4: Connect Students to Subjects with Marks

 

// Alice: 80, 90, 70  → Total = 240
g.V().has('student','name','Alice').as('s').
  V().has('subject','name','Math').addE('scored').from('s').property('marks',80)

g.V().has('student','name','Alice').as('s').
  V().has('subject','name','Physics').addE('scored').from('s').property('marks',90)

g.V().has('student','name','Alice').as('s').
  V().has('subject','name','Chemistry').addE('scored').from('s').property('marks',70)


// Bob: 60, 75, 85 → Total = 220
g.V().has('student','name','Bob').as('s').
  V().has('subject','name','Math').addE('scored').from('s').property('marks',60)

g.V().has('student','name','Bob').as('s').
  V().has('subject','name','Physics').addE('scored').from('s').property('marks',75)

g.V().has('student','name','Bob').as('s').
  V().has('subject','name','Chemistry').addE('scored').from('s').property('marks',85)


// Carol: 95, 85, 90 → Total = 270
g.V().has('student','name','Carol').as('s').
  V().has('subject','name','Math').addE('scored').from('s').property('marks',95)

g.V().has('student','name','Carol').as('s').
  V().has('subject','name','Physics').addE('scored').from('s').property('marks',85)

g.V().has('student','name','Carol').as('s').
  V().has('subject','name','Chemistry').addE('scored').from('s').property('marks',90)

Example: Calculate Average Total Marks Per Student

Alice = 240

Bob   = 220

Carol = 270

 

Average = (240 + 220 + 270) / 3

        = 243.33

 

Without local() (WRONG)

g.V().hasLabel('student').
  outE('scored').
  values('marks').
  sum().
  mean()

gremlin> g.V().hasLabel('student').
......1>   outE('scored').
......2>   values('marks').
......3>   sum().
......4>   mean()
==>730.0

   

Here:

·      All marks are collected together

·      sum() produces one number

·      mean() divides by 1

 

Another common mistake

 

g.V().hasLabel('student').
  outE('scored').
  values('marks').
  mean()

gremlin> g.V().hasLabel('student').
......1>   outE('scored').
......2>   values('marks').
......3>   mean()
==>81.11111111111111

   

This gives average of all subject marks, and it is NOT average per student.

 

Correct Way Using local()

 

g.V().hasLabel('student').
  local(
      outE('scored').
      values('marks').
      sum()
  ).
  mean()

gremlin> g.V().hasLabel('student').
......1>   local(
......2>       outE('scored').
......3>       values('marks').
......4>       sum()
......5>   ).
......6>   mean()
==>243.33333333333334

   

What Happens Now?

For each student, local step will calculate the sum.

 

·      Alice sum = 240

·      Bob   sum = 220

·      Carol sum = 270

 

Collection becomes: [240, 220, 270]

 

gremlin> g.V().hasLabel('student').
......1>   local(
......2>       outE('scored').
......3>       values('marks').
......4>       sum()
......5>   )
==>240
==>220
==>270

   

Then: mean() 243.33

 

Here is how you can see total and average marks of individuals.

 

g.V().hasLabel('student').
  project('name','totalMarks', 'averageMarks').
    by('name').
    by(local(
        outE('scored').
        values('marks').
        sum()
    )).
    by(local(
        outE('scored').
        values('marks').
        mean()
    ))

gremlin> g.V().hasLabel('student').
......1>   project('name','totalMarks', 'averageMarks').
......2>     by('name').
......3>     by(local(
......4>         outE('scored').
......5>         values('marks').
......6>         sum()
......7>     )).
......8>     by(local(
......9>         outE('scored').
.....10>         values('marks').
.....11>         mean()
.....12>     ))
==>[name:Alice,totalMarks:240,averageMarks:80.0]
==>[name:Bob,totalMarks:220,averageMarks:73.33333333333333]
==>[name:Carol,totalMarks:270,averageMarks:90.0]

   

The local() step is all about scope. By default, many Gremlin steps operate on the entire stream of traversers. That means aggregations like count(), sum(), or mean() may unintentionally compute results globally.

 

local() changes that behavior. It tells Gremlin, execute this sub-traversal independently for each incoming element.

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment