Saturday, 6 June 2026

Using between() in Gremlin to Simulate startsWith

  

When working with string properties in Gremlin, one common requirement is performing prefix-based searches, similar to a startsWith() operation in traditional programming languages.

 

However, in Apache TinkerPop Gremlin, there is no built-in startsWith() predicate. Unlike SQL, which provides LIKE, or search engines that support regex and text analyzers, Gremlin’s core predicate system is intentionally minimal and optimized around exact matching and range-based comparisons.

 

So how do we perform prefix matching?

The answer lies in understanding how Gremlin compares strings: lexicographically. Because string comparison in Gremlin follows lexicographical (dictionary-style) ordering, we can use the between() predicate to define a range of values that effectively simulates a prefix search.

 

For example:

·      All strings starting with "Ind" fall lexicographically between "Ind" and "Ine".

·      Everything beginning with "B" falls between "B" and "Ba".

 

This works because:

·      String comparisons are character-by-character.

·      Any additional characters after the prefix are naturally included within the defined range.

·      The upper bound acts as a lexical boundary.

·      This technique gives us a lightweight, index-friendly way to perform prefix filtering without regex support.

 

Understanding this pattern is important because:

 

·      Gremlin does not currently support regular expressions natively.

·      There are no built-in fuzzy search or text analysis operators in core TinkerPop.

·      Prefix search is a common requirement in real-world graph applications (user names, product categories, tags, etc.).

·      This approach works efficiently with property indexes in many graph databases.

 

While graph systems like JanusGraph provide additional text search capabilities, mastering this between() pattern ensures your Gremlin queries remain portable across TinkerPop enabled systems.

 

Let’s build a simple Gremlin Graph to understand the between step.

 

Step 1: Get the Graph traversal instance.

graph = TinkerGraph.open()
g = graph.traversal()

   

Step 2: Add Sample Books.

 

g.addV('book').property('title','Data Structures in Java')
g.addV('book').property('title','Database Design Fundamentals')
g.addV('book').property('title','Data Science Handbook')
g.addV('book').property('title','Distributed Systems Concepts')
g.addV('book').property('title','Design Patterns Explained')
g.addV('book').property('title','Deep Learning Basics')
g.addV('book').property('title','Docker in Action')
g.addV('book').property('title','DevOps Handbook')

Step 3: Verify Data

g.V().hasLabel('book').values('title')

gremlin> g.V().
......1>   hasLabel('book').
......2>   values('title')
==>Data Structures in Java
==>Database Design Fundamentals
==>Data Science Handbook
==>Distributed Systems Concepts
==>Design Patterns Explained
==>Deep Learning Basics
==>Docker in Action
==>DevOps Handbook

   

Use between() to Simulate startsWith("Data")

All titles starting with "Data" fall lexicographically between:

 

·      Lower bound: "Data"

·      Upper bound: "Datb"

 

g.V().
  hasLabel('book').
  has('title', between('Data', 'Datb')).
  values('title')

gremlin> g.V().
......1>   hasLabel('book').
......2>   has('title', between('Data', 'Datb')).
......3>   values('title')
==>Data Structures in Java
==>Database Design Fundamentals
==>Data Science Handbook

Expand the Range (Data Dee)

If we want titles starting from "Data" through "Dee".

g.V().
  hasLabel('book').
  has('title', between('Data', 'Dee')).
  valueMap(true)

gremlin> g.V().
......1>   hasLabel('book').
......2>   has('title', between('Data', 'Dee')).
......3>   valueMap(true)
==>[id:0,label:book,title:[Data Structures in Java]]
==>[id:2,label:book,title:[Database Design Fundamentals]]
==>[id:4,label:book,title:[Data Science Handbook]]

   

Single Character Example (Titles Starting with "D")

 

g.V().hasLabel('book').
  has('title', between('D','E')).
  values('title').
  order()

gremlin> g.V().hasLabel('book').
......1>   has('title', between('D','E')).
......2>   values('title').
......3>   order()
==>Data Science Handbook
==>Data Structures in Java
==>Database Design Fundamentals
==>Deep Learning Basics
==>Design Patterns Explained
==>DevOps Handbook
==>Distributed Systems Concepts
==>Docker in Action

  

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment