Wednesday, 1 April 2026

Aggregation with mean() in Gremlin

Filtering vertices is only one part of graph querying. In many real-world scenarios, the next logical step is to derive insights from the filtered data—such as averages, totals, minimums, or maximums. Gremlin supports this style of analytical querying through a set of aggregation steps, one of which is mean().

 

This post focuses on the mean() step, what it does, why it exists, and how it fits naturally into Gremlin traversals.

 

What Does mean() Do?

The mean() step computes the arithmetic average of a stream of numeric values flowing through the traversal. Conceptually, it answers the question "What is the average value of these numbers?".

 

It is typically applied after steps like values(), which project numeric properties from graph elements.

 

Why mean() Matters?

Graphs often store rich attribute data like ages of people, salaries of employees, costs of resources, durations of processes, and so on. While traversals commonly return individual values, analytics-oriented queries frequently require summarization.

 

For example:

·      What is the average age of selected people?

·      What is the average experience level of a team?

·      What is the average cost across related entities?

 

The mean() step allows these questions to be answered directly within Gremlin, without exporting data to external analytics tools.

 

A Step-by-Step Example Using mean()

To understand mean() clearly, let us walk through a complete example using a fresh dataset.

 

Step 1: Create a New Graph

graph = TinkerGraph.open()
g = graph.traversal()

   

This initializes an empty, in-memory graph suitable for experimentation.

 

gremlin> graph = TinkerGraph.open()
==>tinkergraph[vertices:0 edges:0]
gremlin> g = graph.traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]

   

Step 2: Create Person Vertices

Add a set of vertices representing people, each with a name and an age.

 

g.addV('person').property('name', 'Hari').property('age', 32)
g.addV('person').property('name', 'Krishna').property('age', 28)
g.addV('person').property('name', 'Ram').property('age', 35)
g.addV('person').property('name', 'Sita').property('age', 30)
g.addV('person').property('name', 'Sailu').property('age', 26)
g.addV('person').property('name', 'Chamu').property('age', 24)

gremlin> g.addV('person').property('name', 'Hari').property('age', 32)
==>v[0]
gremlin> g.addV('person').property('name', 'Krishna').property('age', 28)
==>v[3]
gremlin> g.addV('person').property('name', 'Ram').property('age', 35)
==>v[6]
gremlin> g.addV('person').property('name', 'Sita').property('age', 30)
==>v[9]
gremlin> g.addV('person').property('name', 'Sailu').property('age', 26)
==>v[12]
gremlin> g.addV('person').property('name', 'Chamu').property('age', 24)
==>v[15]

At this point, the graph contains six person vertices, each with an associated age.

 

Step 3: Filter a Subset Using within

Suppose the requirement is to compute the average age of a specific group of people, say Hari, Ram, and Sita.

g.V().
  has('person', 'name', within('Hari', 'Ram', 'Sita')).
  values('age')

gremlin> g.V().
......1>   has('person', 'name', within('Hari', 'Ram', 'Sita')).
......2>   values('age')
==>32
==>35
==>30

This traversal produces a stream of numeric values:

==>32
==>35
==>30

These values are not yet aggregated, they are simply flowing through the traversal.

 

Step 4: Apply mean()

To compute the average of these ages, append the mean() step.

g.V().
  has('person', 'name', within('Hari', 'Ram', 'Sita')).
  values('age').
  mean()

gremlin> g.V().
......1>   has('person', 'name', within('Hari', 'Ram', 'Sita')).
......2>   values('age').
......3>   mean()
==>32.333333333333336

   

How This Traversal Works Internally?

·      g.V(): Starts with all vertices in the graph.

·      has('person', 'name', within('Hari', 'Ram', 'Sita')): Filters the traversal to only the selected people.

·      values('age'): Projects the age property from each matching vertex, producing a stream of numbers.

·      mean(): Consumes the numeric stream and computes the arithmetic average.

 

The result is a single value, not a list, because mean() is a reducing (aggregating) step.

 

In summary, the mean() step transforms a stream of numeric property values into a single, meaningful metric.

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment