When designing a graph data model, labels play a crucial role in organizing and querying your data effectively. In Apache TinkerPop’s Apache TinkerPop Gremlin, both vertices and edges can be assigned meaningful labels that act as logical types within your graph.
In this post, we’ll explore:
· What vertex and edge labels are
· Why labels matter in graph design
· How to query using labels in Gremlin
· Different ways to filter using hasLabel(), label(), and has()
· Working with multiple labels
· Performance considerations and indexing strategies
We’ll also discuss:
· When to rely on labels vs. properties
· How label indexing impacts performance
· Best practices for large-scale graphs
· When to use a property as a surrogate for labels
By the end of this article, you’ll understand how to design cleaner graph schemas and write more optimized Gremlin traversals using labels effectively.
1. Demo Graph: Company Management System
Let's build a simple "Company Management System" graph to demo the examples.
Vertex Labels
· employee
· department
· project
· location
Edge Labels
· works_in
· manages
· assigned_to
· located_at
Step 1: Create a new in-memory graph
graph = TinkerGraph.open() g = graph.traversal()
Step 2: Create Departments
eng = g.addV('department').property('name','Engineering').next() hr = g.addV('department').property('name','HR').next()
Step 3: Create Locations
blr = g.addV('location').property('city','Bangalore').next() hyd = g.addV('location').property('city','Hyderabad').next()
Step 4: Create Employees
alice = g.addV('employee'). property('name','Alice'). property('role','Manager'). next() bob = g.addV('employee'). property('name','Bob'). property('role','Developer'). next() carol = g.addV('employee'). property('name','Carol'). property('role','HR Executive'). next()
Step 5: Create Projects
p1 = g.addV('project').property('name','AI Platform').next() p2 = g.addV('project').property('name','HR Automation').next()
Step 6: Create Relationships
g.addE('works_in').from(alice).to(eng).iterate() g.addE('works_in').from(bob).to(eng).iterate() g.addE('works_in').from(carol).to(hr).iterate() g.addE('manages').from(alice).to(bob).iterate() g.addE('assigned_to').from(bob).to(p1).iterate() g.addE('assigned_to').from(carol).to(p2).iterate() g.addE('located_at').from(eng).to(blr).iterate() g.addE('located_at').from(hr).to(hyd).iterate()
gremlin> g.V().valueMap(true) ==>[id:0,label:department,name:[Engineering]] ==>[id:17,label:project,name:[AI Platform]] ==>[id:2,label:department,name:[HR]] ==>[id:19,label:project,name:[HR Automation]] ==>[id:4,label:location,city:[Bangalore]] ==>[id:6,label:location,city:[Hyderabad]] ==>[id:8,label:employee,role:[Manager],name:[Alice]] ==>[id:11,label:employee,role:[Developer],name:[Bob]] ==>[id:14,label:employee,role:[HR Executive],name:[Carol]] gremlin> gremlin> gremlin> g.E().valueMap(true) ==>[id:21,label:works_in] ==>[id:22,label:works_in] ==>[id:23,label:works_in] ==>[id:24,label:manages] ==>[id:25,label:assigned_to] ==>[id:26,label:assigned_to] ==>[id:27,label:located_at] ==>[id:28,label:located_at] gremlin> g.V().valueMap(true) ==>[id:0,label:department,name:[Engineering]] ==>[id:17,label:project,name:[AI Platform]] ==>[id:2,label:department,name:[HR]] ==>[id:19,label:project,name:[HR Automation]] ==>[id:4,label:location,city:[Bangalore]] ==>[id:6,label:location,city:[Hyderabad]] ==>[id:8,label:employee,role:[Manager],name:[Alice]] ==>[id:11,label:employee,role:[Developer],name:[Bob]] ==>[id:14,label:employee,role:[HR Executive],name:[Carol]] gremlin> gremlin> gremlin> g.E().valueMap(true) ==>[id:21,label:works_in] ==>[id:22,label:works_in] ==>[id:23,label:works_in] ==>[id:24,label:manages] ==>[id:25,label:assigned_to] ==>[id:26,label:assigned_to] ==>[id:27,label:located_at] ==>[id:28,label:located_at]
2. What Vertex and Edge Labels Are?
In Apache TinkerPop Gremlin, every vertex and edge has a label. Think of a label as the type of the graph element.
For example, when we created vertices, we passed employee, department, project and location as arguments to addV method.
g.addV('employee') g.addV('department') g.addV('project') g.addV('location')
So when we run following statement, it return the label that Alice belongs to.
g.V(). has('name','Alice'). label()
gremlin> g.V(). ......1> has('name','Alice'). ......2> label() ==>employee
Edge Labels
g.addE('works_in') statement creates an edge with label 'works_in'.
g.V(). has('name','Alice'). outE(). label()
gremlin> g.V(). ......1> has('name','Alice'). ......2> outE(). ......3> label() ==>works_in ==>manages
Edge labels describe how vertices are connected.
3. Why Labels Matter in Graph Design?
Labels are not just names, they define structure and meaning. We can filter out the vertices and edges based on their meaning/labels.
g.V(). hasLabel('employee'). valueMap(true)
gremlin> g.V(). ......1> hasLabel('employee'). ......2> valueMap(true) ==>[id:8,label:employee,role:[Manager],name:[Alice]] ==>[id:11,label:employee,role:[Developer],name:[Bob]] ==>[id:14,label:employee,role:[HR Executive],name:[Carol]]
4. How to Query Using Labels in Gremlin?
Following statement gets all the employees.
g.V(). hasLabel('employee'). valueMap(true)
gremlin> g.V(). ......1> hasLabel('employee'). ......2> valueMap(true) ==>[id:8,label:employee,role:[Manager],name:[Alice]] ==>[id:11,label:employee,role:[Developer],name:[Bob]] ==>[id:14,label:employee,role:[HR Executive],name:[Carol]]
g.V(). hasLabel('employee'). values('name')
g.V(). hasLabel('employee'). values('name')
Similarly following statement print all the departments.
g.V(). hasLabel('department'). values('name')
gremlin> g.V(). ......1> hasLabel('department'). ......2> values('name') ==>Engineering ==>HR
Get All Employees Working in Engineering
g.V(). hasLabel('employee'). as('emp'). out('works_in'). hasLabel('department'). has('name','Engineering'). select('emp'). values('name')
gremlin> g.V(). ......1> hasLabel('employee'). ......2> as('emp'). ......3> out('works_in'). ......4> hasLabel('department'). ......5> has('name','Engineering'). ......6> select('emp'). ......7> values('name') ==>Alice ==>Bob
We can even write above query like below.
g.V(). hasLabel('department'). has('name', 'Engineering'). in('works_in'). values('name')
gremlin> g.V(). ......1> hasLabel('department'). ......2> has('name', 'Engineering'). ......3> in('works_in'). ......4> values('name') ==>Alice ==>Bob
4. Filter elements using label() step
g.V(). where(label().is(eq('employee'))). values('name')
gremlin> g.V(). ......1> where(label().is(eq('employee'))). ......2> values('name') ==>Alice ==>Bob ==>Carol
Using has(label, value)
g.V(). has(label, 'employee'). values('name')
gremlin> g.V(). ......1> has(label, 'employee'). ......2> values('name') ==>Alice ==>Bob ==>Carol
Using three parameter has()
g.V(). has('employee','name','Bob'). values('name')
gremlin> g.V(). ......1> has('employee','name','Bob'). ......2> values('name') ==>Bob
5. Working with Multiple Labels
Gremlin allows multiple labels in one step.
Get All Employees and Departments
g.V(). hasLabel('employee', 'department'). values('name')
gremlin> g.V(). ......1> hasLabel('employee', 'department'). ......2> values('name') ==>Engineering ==>HR ==>Alice ==>Bob ==>Carol
For Edges
g.E(). hasLabel('works_in','assigned_to'). valueMap(true)
gremlin> g.E(). ......1> hasLabel('works_in','assigned_to'). ......2> valueMap(true) ==>[id:21,label:works_in] ==>[id:22,label:works_in] ==>[id:23,label:works_in] ==>[id:25,label:assigned_to] ==>[id:26,label:assigned_to]
6. Performance Considerations and Indexing Strategies
Not all graph databases index labels. Some do, some don’t and some partially do.
Suppose if labels are not indexed, following statement scans entire vertex set.
g.V().hasLabel('employee')
If your graph engine does not index labels, then add a property like below.
g.addV('entity').property('type','employee')
Then index the property 'type' and query like below. This becomes index-backed.
g.V().has('type','employee')
Previous Next Home
No comments:
Post a Comment