Counting individual elements such as the total number of vertices or edges is often only the starting point in graph analysis. While such totals provide a high-level sense of scale, they rarely answer the questions that matter most in real systems.
In practice, analysts and engineers are far more interested in distributions than in raw totals. Instead of asking how many exist, the more meaningful question is often how many exist per category.
Consider the following examples drawn from common business scenarios:
· How many orders has each customer placed?
· How many products exist in each category?
· How many deliveries are made to each city?
· How many items are sold per brand?
Each of these questions requires grouped counting, counting elements after partitioning them into logical groups.
Apache TinkerPop provides first-class support for this pattern through two closely related traversal constructs:
· groupCount(): a concise and optimized step dedicated to counting
· group() combined with count(): a more expressive and flexible alternative
This chapter explains how grouped counting works in Gremlin, why it is essential for analytical queries, and how to apply it effectively using a familiar e-commerce domain.
Example: A Simplified E-Commerce Graph
To help the discussion, consider a simplified e-commerce platform similar to a large online marketplace operating across Indian cities.
Vertex Labels
· customer: represents a customer (for example, Ravi or Ananya)
· order: represents an order placed on the platform
· product: represents a product for sale
· category: represents a product category
· city: represents a delivery city (such as Bengaluru or Hyderabad)
Edge Labels
· customer - placed → order
· order — contains → product
· product — belongsTo → category
· order - deliveredTo → city
This graph structure enables questions about customer behavior, product distribution, and geographic demand to be expressed naturally as traversals.
Step 1: Create Graph and Traversal Source
graph = TinkerGraph.open() g = graph.traversal()
Step 2: Create City Vertices
blr = g.addV('city').property('name', 'Bengaluru').next() hyd = g.addV('city').property('name', 'Hyderabad').next() chn = g.addV('city').property('name', 'Chennai').next() pne = g.addV('city').property('name', 'Pune').next()
Step 3: Create Category Vertices
catElectronics = g.addV('category').property('name', 'Electronics').next() catBooks = g.addV('category').property('name', 'Books').next() catClothing = g.addV('category').property('name', 'Clothing').next()
Step 4: Create Product Vertices
p1 = g.addV('product').property('sku', 'P1001').property('name', 'Laptop').next() p2 = g.addV('product').property('sku', 'P1002').property('name', 'Smartphone').next() p3 = g.addV('product').property('sku', 'P2001').property('name', 'Java Programming Book').next() p4 = g.addV('product').property('sku', 'P3001').property('name', 'T-Shirt').next()
Step 5: Link Products to Categories
g.V(p1).addE('belongsTo').to(catElectronics).next() g.V(p2).addE('belongsTo').to(catElectronics).next() g.V(p3).addE('belongsTo').to(catBooks).next() g.V(p4).addE('belongsTo').to(catClothing).next()
Step 6: Create Customer Vertices
c1 = g.addV('customer').property('customerId', 'C101').property('name', 'Amit Sharma').next() c2 = g.addV('customer').property('customerId', 'C102').property('name', 'Priya Iyer').next() c3 = g.addV('customer').property('customerId', 'C103').property('name', 'Rohit Verma').next()
Step 7: Create Order Vertices
o1 = g.addV('order').property('orderId', 'O9001').next() o2 = g.addV('order').property('orderId', 'O9002').next() o3 = g.addV('order').property('orderId', 'O9003').next() o4 = g.addV('order').property('orderId', 'O9004').next() o5 = g.addV('order').property('orderId', 'O9005').next()
Step 8: Connect Customers to Orders (placed)
g.V(c1).addE('placed').to(o1).next() g.V(c1).addE('placed').to(o2).next() g.V(c2).addE('placed').to(o3).next() g.V(c3).addE('placed').to(o4).next() g.V(c3).addE('placed').to(o5).next()
Step 9: Connect Orders to Products (contains)
g.V(o1).addE('contains').to(p1).next() g.V(o1).addE('contains').to(p3).next() g.V(o2).addE('contains').to(p2).next() g.V(o3).addE('contains').to(p4).next() g.V(o4).addE('contains').to(p1).next() g.V(o4).addE('contains').to(p4).next() g.V(o5).addE('contains').to(p3).next()
Step 10: Connect Orders to Cities (deliveredTo)
g.V(o1).addE('deliveredTo').to(blr).next() g.V(o2).addE('deliveredTo').to(blr).next() g.V(o3).addE('deliveredTo').to(hyd).next() g.V(o4).addE('deliveredTo').to(chn).next() g.V(o5).addE('deliveredTo').to(pne).next()
How Many of Each Type of Vertex Exist?
A common diagnostic task when working with a graph is understanding its overall composition. Grouped counting can quickly reveal how many vertices exist for each label.
g.V().groupCount().by(label)
gremlin> g.V().groupCount().by(label) ==>[product:4,city:4,category:3,customer:3,order:5]
Here:
· The keys in the result map are vertex labels.
· The values represent the number of vertices with each label.
The same result can be produced using an explicit label extraction:
gremlin> g.V().label().groupCount() ==>[product:4,city:4,category:3,customer:3,order:5]
Both traversals are valid. The choice between them is often stylistic, though by(label) makes the grouping criterion more explicit and readable.
Note
As with simple count() traversals, grouped counting over all vertices or edges is not recommended for very large graphs in production workloads. However, for reporting, analytics, diagnostics, and queries scoped to subgraphs, grouped counting is both practical and extremely valuable.
Previous Next Home
No comments:
Post a Comment