Tuesday, 12 May 2026

Grouped Counting in Gremlin: Understanding Distributions, Not Just Totals

  

Counting individual elements such as the total number of vertices or edges is often only the starting point in graph analysis. While such totals provide a high-level sense of scale, they rarely answer the questions that matter most in real systems.

 

In practice, analysts and engineers are far more interested in distributions than in raw totals. Instead of asking how many exist, the more meaningful question is often how many exist per category.

 

Consider the following examples drawn from common business scenarios:

 

·      How many orders has each customer placed?

·      How many products exist in each category?

·      How many deliveries are made to each city?

·      How many items are sold per brand?

 

Each of these questions requires grouped counting, counting elements after partitioning them into logical groups.

 

Apache TinkerPop provides first-class support for this pattern through two closely related traversal constructs:

 

·      groupCount(): a concise and optimized step dedicated to counting

·      group() combined with count(): a more expressive and flexible alternative

 

This chapter explains how grouped counting works in Gremlin, why it is essential for analytical queries, and how to apply it effectively using a familiar e-commerce domain.

 

Example: A Simplified E-Commerce Graph

To help the discussion, consider a simplified e-commerce platform similar to a large online marketplace operating across Indian cities.

 

Vertex Labels

·      customer: represents a customer (for example, Ravi or Ananya)

·      order: represents an order placed on the platform

·      product: represents a product for sale

·      category: represents a product category

·      city: represents a delivery city (such as Bengaluru or Hyderabad)

 

Edge Labels

·      customer - placed  order

·      order — contains product

·      product — belongsTo category

·      order - deliveredTo city

 

This graph structure enables questions about customer behavior, product distribution, and geographic demand to be expressed naturally as traversals.

 

Step 1: Create Graph and Traversal Source

graph = TinkerGraph.open()
g = graph.traversal()

Step 2: Create City Vertices

blr = g.addV('city').property('name', 'Bengaluru').next()
hyd = g.addV('city').property('name', 'Hyderabad').next()
chn = g.addV('city').property('name', 'Chennai').next()
pne = g.addV('city').property('name', 'Pune').next()

Step 3: Create Category Vertices

catElectronics = g.addV('category').property('name', 'Electronics').next()
catBooks       = g.addV('category').property('name', 'Books').next()
catClothing    = g.addV('category').property('name', 'Clothing').next()

   

Step 4: Create Product Vertices

 

p1 = g.addV('product').property('sku', 'P1001').property('name', 'Laptop').next()
p2 = g.addV('product').property('sku', 'P1002').property('name', 'Smartphone').next()
p3 = g.addV('product').property('sku', 'P2001').property('name', 'Java Programming Book').next()
p4 = g.addV('product').property('sku', 'P3001').property('name', 'T-Shirt').next()

   

Step 5: Link Products to Categories

 

g.V(p1).addE('belongsTo').to(catElectronics).next()
g.V(p2).addE('belongsTo').to(catElectronics).next()
g.V(p3).addE('belongsTo').to(catBooks).next()
g.V(p4).addE('belongsTo').to(catClothing).next()

   

Step 6: Create Customer Vertices

 

c1 = g.addV('customer').property('customerId', 'C101').property('name', 'Amit Sharma').next()
c2 = g.addV('customer').property('customerId', 'C102').property('name', 'Priya Iyer').next()
c3 = g.addV('customer').property('customerId', 'C103').property('name', 'Rohit Verma').next()

   

Step 7: Create Order Vertices

 

o1 = g.addV('order').property('orderId', 'O9001').next()
o2 = g.addV('order').property('orderId', 'O9002').next()
o3 = g.addV('order').property('orderId', 'O9003').next()
o4 = g.addV('order').property('orderId', 'O9004').next()
o5 = g.addV('order').property('orderId', 'O9005').next()

   

Step 8: Connect Customers to Orders (placed)

 

g.V(c1).addE('placed').to(o1).next()
g.V(c1).addE('placed').to(o2).next()

g.V(c2).addE('placed').to(o3).next()

g.V(c3).addE('placed').to(o4).next()
g.V(c3).addE('placed').to(o5).next()

   

Step 9: Connect Orders to Products (contains)

 

g.V(o1).addE('contains').to(p1).next()
g.V(o1).addE('contains').to(p3).next()

g.V(o2).addE('contains').to(p2).next()

g.V(o3).addE('contains').to(p4).next()

g.V(o4).addE('contains').to(p1).next()
g.V(o4).addE('contains').to(p4).next()

g.V(o5).addE('contains').to(p3).next()

   

Step 10: Connect Orders to Cities (deliveredTo)

 

g.V(o1).addE('deliveredTo').to(blr).next()
g.V(o2).addE('deliveredTo').to(blr).next()

g.V(o3).addE('deliveredTo').to(hyd).next()

g.V(o4).addE('deliveredTo').to(chn).next()

g.V(o5).addE('deliveredTo').to(pne).next()

   

How Many of Each Type of Vertex Exist?

A common diagnostic task when working with a graph is understanding its overall composition. Grouped counting can quickly reveal how many vertices exist for each label.

 

g.V().groupCount().by(label)

gremlin> g.V().groupCount().by(label)
==>[product:4,city:4,category:3,customer:3,order:5]

   

Here:

·      The keys in the result map are vertex labels.

·      The values represent the number of vertices with each label.

 

The same result can be produced using an explicit label extraction:

 

gremlin> g.V().label().groupCount()
==>[product:4,city:4,category:3,customer:3,order:5]

   

Both traversals are valid. The choice between them is often stylistic, though by(label) makes the grouping criterion more explicit and readable.

 

Note

As with simple count() traversals, grouped counting over all vertices or edges is not recommended for very large graphs in production workloads. However, for reporting, analytics, diagnostics, and queries scoped to subgraphs, grouped counting is both practical and extremely valuable.

  

Previous                                                    Next                                                    Home

No comments:

Post a Comment