Programming for beginners: Gremlin as a Statistical Engine: count(), sum(), mean(), min(), max()

When most people think about Gremlin in Apache TinkerPop, they think about traversals like walking vertices and edges to discover relationships. But Gremlin is more than just a navigation language. It is also a powerful statistical engine capable of performing meaningful numerical analysis directly inside a traversal.

Modern graph applications are not only about connections, they are also about measurement.

· How many connections exist?

· What is the average number of relationships per vertex?

· What is the maximum capacity, weight, or score in the graph?

· What is the minimum value across a dataset?

· What is the total of a numeric property?

Instead of exporting graph data into an external analytics system, Gremlin allows you to compute these insights in place, within the graph engine itself.

Why Statistical Steps Matter in Graph Systems

In relational databases, aggregation functions like COUNT, SUM, and AVG are common. Gremlin provides similar capabilities, but with a key advantage, aggregations can be applied at any point in a traversal.

This means statistics can be:

· Global (across the entire graph)

· Label-specific (only certain vertices or edges)

· Traversal-scoped (computed per vertex using local())

· Relationship-aware (based on edge counts)

This flexibility turns Gremlin into a lightweight analytics framework embedded directly into your graph queries.

1. University Graph Model

Let's model an University data into Gremlin Graph.

Vertex Labels

· student

· course

· professor

· department

Edge Labels

· enrolled_in (student → course)

· teaches (professor → course)

· belongs_to (course → department)

Step 1: Create graph traversal instance.

graph = TinkerGraph.open()
g = graph.traversal()

Step 2: Create Departments

g.addV('department').property('name','Computer Science').property('budget',500000)
g.addV('department').property('name','Mathematics').property('budget',300000)
g.addV('department').property('name','Physics').property('budget',200000)

Step 3: Create Courses

g.addV('course').property('name','Algorithms').property('credits',4)
g.addV('course').property('name','Data Structures').property('credits',3)
g.addV('course').property('name','Calculus').property('credits',4)
g.addV('course').property('name','Quantum Mechanics').property('credits',5)

Step 4: Create Students

g.addV('student').property('name','Alice').property('age',20).property('gpa',3.8)
g.addV('student').property('name','Bob').property('age',22).property('gpa',3.2)
g.addV('student').property('name','Carol').property('age',21).property('gpa',3.6)
g.addV('student').property('name','David').property('age',23).property('gpa',2.9)

Step 5: Create Professors

g.addV('professor').property('name','Dr. Smith').property('salary',120000)
g.addV('professor').property('name','Dr. Brown').property('salary',95000)
g.addV('professor').property('name','Dr. Lee').property('salary',110000)

Step 6: Connect Courses → Departments

g.V().has('course','name','Algorithms').
  addE('belongs_to').
  to(__.V().has('department','name','Computer Science'))

g.V().has('course','name','Data Structures').
  addE('belongs_to').
  to(__.V().has('department','name','Computer Science'))

g.V().has('course','name','Calculus').
  addE('belongs_to').
  to(__.V().has('department','name','Mathematics'))

g.V().has('course','name','Quantum Mechanics').
  addE('belongs_to').
  to(__.V().has('department','name','Physics'))

Step 7: Connect Professors → Courses

g.V().has('professor','name','Dr. Smith').
  addE('teaches').
  to(__.V().has('course','name','Algorithms'))

g.V().has('professor','name','Dr. Smith').
  addE('teaches').
  to(__.V().has('course','name','Data Structures'))

g.V().has('professor','name','Dr. Brown').
  addE('teaches').
  to(__.V().has('course','name','Calculus'))

g.V().has('professor','name','Dr. Lee').
  addE('teaches').
  to(__.V().has('course','name','Quantum Mechanics'))

Step 8: Connect Students → Courses

g.V().has('student','name','Alice').
  addE('enrolled_in').
  to(__.V().has('course','name','Algorithms'))

g.V().has('student','name','Alice').
  addE('enrolled_in').
  to(__.V().has('course','name','Data Structures'))

g.V().has('student','name','Bob').
  addE('enrolled_in').
  to(__.V().has('course','name','Calculus'))

g.V().has('student','name','Carol').
  addE('enrolled_in').
  to(__.V().has('course','name','Algorithms'))

g.V().has('student','name','Carol').
  addE('enrolled_in').
  to(__.V().has('course','name','Calculus'))

g.V().has('student','name','David').
  addE('enrolled_in').
  to(__.V().has('course','name','Quantum Mechanics'))

Confirm the Graph by printing vertices and edges.

gremlin> g.V().valueMap(true)
==>[id:0,label:department,name:[Computer Science],budget:[500000]]
==>[id:33,label:student,name:[David],gpa:[2.9],age:[23]]
==>[id:3,label:department,name:[Mathematics],budget:[300000]]
==>[id:37,label:professor,name:[Dr. Smith],salary:[120000]]
==>[id:6,label:department,name:[Physics],budget:[200000]]
==>[id:40,label:professor,name:[Dr. Brown],salary:[95000]]
==>[id:9,label:course,credits:[4],name:[Algorithms]]
==>[id:43,label:professor,name:[Dr. Lee],salary:[110000]]
==>[id:12,label:course,credits:[3],name:[Data Structures]]
==>[id:15,label:course,credits:[4],name:[Calculus]]
==>[id:18,label:course,credits:[5],name:[Quantum Mechanics]]
==>[id:21,label:student,name:[Alice],gpa:[3.8],age:[20]]
==>[id:25,label:student,name:[Bob],gpa:[3.2],age:[22]]
==>[id:29,label:student,name:[Carol],gpa:[3.6],age:[21]]
gremlin> 
gremlin> 
gremlin> g.E().valueMap(true)
==>[id:46,label:belongs_to]
==>[id:47,label:belongs_to]
==>[id:48,label:belongs_to]
==>[id:49,label:belongs_to]
==>[id:50,label:teaches]
==>[id:51,label:teaches]
==>[id:52,label:teaches]
==>[id:53,label:teaches]
==>[id:54,label:enrolled_in]
==>[id:55,label:enrolled_in]
==>[id:56,label:enrolled_in]
==>[id:57,label:enrolled_in]
==>[id:58,label:enrolled_in]
==>[id:59,label:enrolled_in]

2. Statistical Operations

2.1 count(): Measuring Quantity

'count()' returns the number of traversers currently flowing through the traversal.

Example 1: Total Vertices

gremlin> g.V().count()
==>14

Example 2: Total Students

gremlin> g.V().hasLabel('student').count()
==>4

Example 3: How Many Courses Exist?

gremlin> g.V().hasLabel('course').count()
==>4

Example 4: Courses Per Student (Using local())

g.V().
  hasLabel('student').
  project('name','count').
    by(values('name')).
    by(out('enrolled_in').count())

gremlin> g.V().
......1>   hasLabel('student').
......2>   project('name','count').
......3>     by(values('name')).
......4>     by(out('enrolled_in').count())
==>[name:David,count:1]
==>[name:Alice,count:2]
==>[name:Bob,count:1]
==>[name:Carol,count:2]

2.2 sum(): Adding Values Together

'sum()' aggregates numeric values and returns their total. It works only on numeric properties.

Example 1: Total University Budget

g.V().
  hasLabel('department').
  values('budget').
  sum()

gremlin> g.V().
......1>   hasLabel('department').
......2>   values('budget').
......3>   sum()
==>1000000

Example 2: Total Professor Salary Expense

g.V().
  hasLabel('professor').
  values('salary').
  sum()

gremlin> g.V().
......1>   hasLabel('professor').
......2>   values('salary').
......3>   sum()
==>325000

Example 3: Total Credits Across All Courses

g.V().
  hasLabel('course').
  values('credits').
  sum()

gremlin> g.V().
......1>   hasLabel('course').
......2>   values('credits').
......3>   sum()
==>16

2.3 mean(): Calculating Average

mean() calculates the arithmetic average.

mean = (Sum Of Values)/(Count Of Values)

Example 1: Average GPA

g.V().
  hasLabel('student').
  values('gpa').
  mean()

gremlin> g.V().
......1>   hasLabel('student').
......2>   values('gpa').
......3>   mean()
==>3.375

Example 2: Average Professor Salary

g.V().
  hasLabel('professor').
  values('salary').
  mean()

gremlin> g.V().
......1>   hasLabel('professor').
......2>   values('salary').
......3>   mean()
==>108333.33333333333

Example 3: Average Courses Per Student

g.V().
  hasLabel('student').
  local(out('enrolled_in').count()).
  mean()

gremlin> g.V().
......1>   hasLabel('student').
......2>   local(out('enrolled_in').count()).
......3>   mean()
==>1.5

2.4 min(): Finding the Smallest Value

Returns the smallest value in the traversal stream. It works with Numbers, Strings and any comparable type (post TinkerPop 3.4).

Example 1: Youngest Student Age

g.V().
  hasLabel('student').
  values('age').
  min()

gremlin> g.V().
......1>   hasLabel('student').
......2>   values('age').
......3>   min()
==>20

Example 2: Lowest Salary

g.V().
  hasLabel('professor').
  values('salary').
  min()

gremlin> g.V().
......1>   hasLabel('professor').
......2>   values('salary').
......3>   min()
==>95000

Example 3: Alphabetically First Department

g.V().
  hasLabel('department').
  values('name').
  min()

gremlin> g.V().
......1>   hasLabel('department').
......2>   values('name').
......3>   min()
==>Computer Science

2.4 max(): Finding the Largest Value

Returns the largest value in the traversal stream.

Example 1: Highest GPA

g.V().
  hasLabel('student').
  values('gpa').
  max()

gremlin> g.V().
......1>   hasLabel('student').
......2>   values('gpa').
......3>   max()
==>3.8

Example 2: Largest Department Budget

g.V().
  hasLabel('department').
  values('budget').
  max()

gremlin> g.V().
......1>   hasLabel('department').
......2>   values('budget').
......3>   max()
==>500000

Example 3: Alphabetically Last Department

g.V().
  hasLabel('department').
  values('name').
  max()

gremlin> g.V().
......1>   hasLabel('department').
......2>   values('name').
......3>   max()
==>Physics

gremlin> g.V().valueMap(true)
==>[id:0,label:department,name:[Computer Science],budget:[500000]]
==>[id:33,label:student,name:[David],gpa:[2.9],age:[23]]
==>[id:3,label:department,name:[Mathematics],budget:[300000]]
==>[id:37,label:professor,name:[Dr. Smith],salary:[120000]]
==>[id:6,label:department,name:[Physics],budget:[200000]]
==>[id:40,label:professor,name:[Dr. Brown],salary:[95000]]
==>[id:9,label:course,credits:[4],name:[Algorithms]]
==>[id:43,label:professor,name:[Dr. Lee],salary:[110000]]
==>[id:12,label:course,credits:[3],name:[Data Structures]]
==>[id:15,label:course,credits:[4],name:[Calculus]]
==>[id:18,label:course,credits:[5],name:[Quantum Mechanics]]
==>[id:21,label:student,name:[Alice],gpa:[3.8],age:[20]]
==>[id:25,label:student,name:[Bob],gpa:[3.2],age:[22]]
==>[id:29,label:student,name:[Carol],gpa:[3.6],age:[21]]
gremlin> 
gremlin> g.E().valueMap(true)
==>[id:46,label:belongs_to]
==>[id:47,label:belongs_to]
==>[id:48,label:belongs_to]
==>[id:49,label:belongs_to]
==>[id:50,label:teaches]
==>[id:51,label:teaches]
==>[id:52,label:teaches]
==>[id:53,label:teaches]
==>[id:54,label:enrolled_in]
==>[id:55,label:enrolled_in]
==>[id:56,label:enrolled_in]
==>[id:57,label:enrolled_in]
==>[id:58,label:enrolled_in]
==>[id:59,label:enrolled_in]

3. Global Aggregation

A global aggregation combines all traversers in the current stream and then computes the statistic across the entire stream.

Following snippet calculate the total Courses Across All Students

g.V().
  hasLabel('student').
  out('enrolled_in').
  count()

gremlin> g.V().
......1>   hasLabel('student').
......2>   out('enrolled_in').
......3>   count()
==>6

Local Aggregation

A local aggregation computes the statistic within the context of each traverser individually. Think of it as “for each student, compute the value separately.

g.V().hasLabel('student').
  project('name','courses_count').
    by('name').
    by(local(out('enrolled_in').count())).
  order().by(select('courses_count'), desc)

gremlin> g.V().hasLabel('student').
......1>   project('name','courses_count').
......2>     by('name').
......3>     by(local(out('enrolled_in').count())).
......4>   order().by(select('courses_count'), desc)
==>[name:Alice,courses_count:2]
==>[name:Carol,courses_count:2]
==>[name:David,courses_count:1]
==>[name:Bob,courses_count:1]

Previous Next Home

Programming for beginners

Friday, 5 June 2026

Gremlin as a Statistical Engine: count(), sum(), mean(), min(), max()

No comments:

Post a Comment