When working with graph traversals in Apache TinkerPop, it is often unnecessary and inefficient—to return an entire result set. Gremlin provides multiple steps that allow you to retrieve only a subset of vertices from a traversal, making it easier to paginate results, skip unwanted elements, or process data in batches.
This post focuses on how to retrieve ranges of vertices using the range() and skip() steps, explains their behavior in practice, and highlights important caveats related to ordering and backend graph implementations.
1. Sample Graph Dataset: Library Books
This dataset models books with:
· book vertices
· genre and country properties
Step 1: Initialize TinkerGraph
graph = TinkerGraph.open() g = graph.traversal()
Step 2: Add Book Vertices
books = [ [title: 'Clean Code', genre: 'Programming', country: 'US'], [title: 'Effective Java', genre: 'Programming', country: 'US'], [title: 'Design Patterns', genre: 'Programming', country: 'US'], [title: 'Refactoring', genre: 'Programming', country: 'US'], [title: 'Domain-Driven Design', genre: 'Programming', country: 'US'], [title: 'The Pragmatic Programmer',genre: 'Programming', country: 'US'], [title: 'Harry Potter', genre: 'Fantasy', country: 'UK'], [title: 'The Hobbit', genre: 'Fantasy', country: 'UK'], [title: 'Lord of the Rings', genre: 'Fantasy', country: 'UK'], [title: 'Game of Thrones', genre: 'Fantasy', country: 'US'], [title: 'The Alchemist', genre: 'Fiction', country: 'BR'], [title: 'One Hundred Years of Solitude', genre: 'Fiction', country: 'CO'] ] books.each { b -> g.addV('book') .property('title', b.title) .property('genre', b.genre) .property('country', b.country) .iterate() }
Let’s print all the vertices and confirm the same.
gremlin> g.V().valueMap(true) ==>[id:0,label:book,country:[US],genre:[Programming],title:[Clean Code]] ==>[id:32,label:book,country:[UK],genre:[Fantasy],title:[Lord of the Rings]] ==>[id:4,label:book,country:[US],genre:[Programming],title:[Effective Java]] ==>[id:36,label:book,country:[US],genre:[Fantasy],title:[Game of Thrones]] ==>[id:8,label:book,country:[US],genre:[Programming],title:[Design Patterns]] ==>[id:40,label:book,country:[BR],genre:[Fiction],title:[The Alchemist]] ==>[id:12,label:book,country:[US],genre:[Programming],title:[Refactoring]] ==>[id:44,label:book,country:[CO],genre:[Fiction],title:[One Hundred Years of Solitude]] ==>[id:16,label:book,country:[US],genre:[Programming],title:[Domain-Driven Design]] ==>[id:20,label:book,country:[US],genre:[Programming],title:[The Pragmatic Programmer]] ==>[id:24,label:book,country:[UK],genre:[Fantasy],title:[Harry Potter]] ==>[id:28,label:book,country:[UK],genre:[Fantasy],title:[The Hobbit]]
2. Using range() to Select a Subset of Vertices
The range() step in Gremlin allows you to select a contiguous subset of traversal results by specifying a start offset and an end offset. Offsets are zero-based, meaning the first element in the traversal stream starts at index 0.
2.1 Basic Usage
g.V(). hasLabel('book'). range(0, 2). values('title')
This traversal returns the first two books detail encountered in the traversal stream. In range step, the start index is included and the end index is excluded. So range(0, 2) returns elements at index 0 and 1, but not 2.
gremlin> g.V(). ......1> hasLabel('book'). ......2> range(0, 2). ......3> values('title') ==>Clean Code ==>Lord of the Rings
2.2 Using Non-Zero Start Offsets
The starting offset does not have to be 0. You can begin slicing the result stream from any position.
// Return the 4th, 5th, and 6th books (zero-based) g.V(). hasLabel('book'). range(3, 6). values('title')
gremlin> g.V(). ......1> hasLabel('book'). ......2> range(3, 6). ......3> values('title') ==>Game of Thrones ==>Design Patterns ==>The Alchemist
This pattern is especially useful for:
· Pagination
· Batch processing
· Incremental traversal execution
2.3 Pagination Example
A common pagination pattern using range() looks like this:
page = 2 pageSize = 3 g.V(). hasLabel('book'). range(page * pageSize, (page + 1) * pageSize). values('title')
This retrieves the third page of results (page index starts at 0).
gremlin> g.V(). ......1> hasLabel('book'). ......2> range(page * pageSize, (page + 1) * pageSize). ......3> values('title') ==>Refactoring ==>One Hundred Years of Solitude ==>Domain-Driven Design
2.4 Using -1 to Indicate “Until the End”
Gremlin allows the use of -1 as the upper bound to indicate that all remaining elements should be returned.
// Return all books starting from the 5th position g.V(). hasLabel('book'). range(5, -1). values('title') gremlin> g.V(). ......1> hasLabel('book'). ......2> range(5, -1). ......3> values('title') ==>The Alchemist ==>Refactoring ==>One Hundred Years of Solitude ==>Domain-Driven Design ==>The Pragmatic Programmer ==>Harry Potter ==>The Hobbit
This pattern is commonly used when:
· You want to skip a known number of results
· You do not know (or care) about the total size
· You are processing the remainder of a large traversal
2.5 Combining Filters with range()
range() is typically applied after filtering, so offsets are calculated on the filtered result set.
g.V(). hasLabel('book'). has('genre', 'Fantasy'). range(0, 2). values('title')
gremlin> g.V(). ......1> hasLabel('book'). ......2> has('genre', 'Fantasy'). ......3> range(0, 2). ......4> values('title') ==>Lord of the Rings ==>Game of Thrones
2.6 Backend-Dependent Ordering Caveat
It is important to understand that Gremlin does not guarantee traversal order unless you explicitly define one. TinkerGraph often returns vertices in insertion order, whereas distributed graph stores (e.g., JanusGraph) may return results in a non-deterministic order.
Best Practice: Always Use order() for Deterministic Results
If predictable ordering is required, always apply an order() step before using range().
g.V(). order().by('title'). range(0, 2). values('title')
gremlin> g.V(). ......1> order().by('title'). ......2> range(0, 2). ......3> values('title') ==>Clean Code ==>Design Patterns
In summary,
· range(start, end) is zero-based
· Behavior is inclusive / exclusive
· -1 means “until the end”
· Ordering is not guaranteed without order()
· Apply range() after filtering
· Always order explicitly when correctness depends on result sequence
3. The skip() Step as an Alternative to range()
Starting with Apache TinkerPop 3.3, the skip() step was introduced as a clearer and more expressive way to discard a fixed number of elements from the beginning of a traversal. Conceptually, skip(n) means, ignore the first n results and return everything that follows.
In many cases, this is equivalent to using range(n, -1), but with improved readability.
3.1 Basic Usage of skip()
Using the Book dataset, the following traversal skips the first five books and returns the remaining ones.
g.V(). hasLabel('book'). skip(5). values('title')
This is functionally equivalent to:
g.V(). hasLabel('book'). range(5, -1). values('title')
gremlin> g.V(). ......1> hasLabel('book'). ......2> skip(5). ......3> values('title') ==>The Alchemist ==>Refactoring ==>One Hundred Years of Solitude ==>Domain-Driven Design ==>The Pragmatic Programmer ==>Harry Potter ==>The Hobbit gremlin> gremlin> g.V(). ......1> hasLabel('book'). ......2> range(5, -1). ......3> values('title') ==>The Alchemist ==>Refactoring ==>One Hundred Years of Solitude ==>Domain-Driven Design ==>The Pragmatic Programmer ==>Harry Potter ==>The Hobbit
3.2 Why skip() Improves Readability
Compare the intent expressed by these two traversals ".range(5, -1)" vs ".skip(5)". While both are valid, skip(5) more clearly communicates intent:
· You are not defining a window
· You are discarding a known number of initial elements
· You want all remaining results
For this reason, skip() is generally preferred when no upper bound is required.
3.3 Verifying skip() Behavior
To verify how skip() works, compare the traversal results with and without skipping.
Without skip()
g.V(). hasLabel('book'). values('title'). fold()
gremlin> g.V(). ......1> hasLabel('book'). ......2> values('title'). ......3> fold() ==>[Clean Code,Lord of the Rings,Effective Java,Game of Thrones,Design Patterns,The Alchemist,Refactoring,One Hundred Years of Solitude,Domain-Driven Design,The Pragmatic Programmer,Harry Potter,The Hobbit]
With skip()
g.V(). hasLabel('book'). skip(7). values('title'). fold()
gremlin> g.V(). ......1> hasLabel('book'). ......2> skip(7). ......3> values('title'). ......4> fold() ==>[One Hundred Years of Solitude,Domain-Driven Design,The Pragmatic Programmer,Harry Potter,The Hobbit]
3.4 Combining Filters with skip()
Just like range(), the skip() step applies after filtering.
g.V(). hasLabel('book'). has('genre', 'Fantasy'). skip(2). values('title')
gremlin> g.V(). ......1> hasLabel('book'). ......2> has('genre', 'Fantasy'). ......3> skip(2). ......4> values('title') ==>Harry Potter ==>The Hobbit
3.5 Applying skip() Locally
By default, skip() operates on the global traversal stream. However, it can also be applied to an incoming collection by using the local scope.
g.V(). hasLabel('book'). has('genre', 'Fantasy'). values('title'). fold(). skip(local, 2)
gremlin> g.V(). ......1> hasLabel('book'). ......2> has('genre', 'Fantasy'). ......3> values('title'). ......4> fold(). ......5> skip(local, 2) ==>[Harry Potter,The Hobbit]
3.6 Why Local Scope Matters
Understanding the difference between global and local scope is important.
|
Scope |
Behavior |
|
Global (default) |
Skips elements as they flow through the traversal |
|
Local |
Skips elements within a collection (e.g., after `fold()`, `group()`) |
4. When to Use skip() vs range()
|
Use Case |
Preferred Step |
|
Skip N elements and return the rest |
skip(n) |
|
Retrieve a fixed window |
range(start, end) |
|
Pagination |
range() |
|
Improve readability |
skip() |
|
Local collection trimming |
skip(local, n) |
In summary,
· skip(n) is equivalent to range(n, -1)
· Introduced in TinkerPop 3.3
· Improves readability when skipping to the end
· Works with global and local scopes
· Should not be used without explicit ordering when order matters
Gremlin provides flexible mechanisms for slicing traversal results using range() and skip(). While these steps are powerful, they should be used with care:
· Offsets are zero-based.
· range() behaves as inclusive/exclusive.
· Ordering is not guaranteed without an explicit order() step.
· skip() offers a clearer alternative when skipping to the end.
· Local vs global scope can significantly change behavior.
· Understanding these differences helps you write more predictable, efficient, and maintainable Gremlin traversals.
Previous Next Home
No comments:
Post a Comment