Along with this post, I am providing a small but expressive graph dataset that models a simplified online learning platform. While tiny by big-data standards, the graph is rich enough to demonstrate many important graph modeling and traversal concepts. The dataset is stored in GraphML, a widely used XML based format for describing graphs and exchanging graph data between tools and platforms.
The "learning-platform.graphml" file is intended purely as a learning aid. It represents a fictional snapshot of an online education system containing students, instructors, courses, topics, and skills. Although the entities and relationships are realistic, the data itself does not correspond to any real platform or individuals.
If you wish, you can open the GraphML file in a text editor to inspect its structure. As you work more deeply with graphs, becoming familiar with common graph serialization formats is essential. Two formats you will encounter frequently are GraphML and GraphSON. GraphSON is a JSON based format defined by Apache TinkerPop and heavily used within that ecosystem, while GraphML is broadly supported by TinkerPop as well as visualization tools such as Gephi. Many ingestion pipelines also continue to rely on simpler formats such as CSV.
Later posts will briefly cover how to load and save graph data using these formats, as well as how to import and export graph data stored in text files.
learning-platform.graphml
<?xml version="1.0" encoding="UTF-8"?> <!-- ******************************************************************************* --> <!-- PLEASE NOTE: --> <!-- --> <!-- The data in this graph is fictional and provided purely for educational and --> <!-- demonstration purposes. While care has been taken to ensure internal --> <!-- consistency, this data should not be interpreted as representing any real --> <!-- individuals, organizations, or learning platforms. --> <!-- --> <!-- This graph is intended as a learning aid for understanding graph modeling, --> <!-- graph traversal, and Gremlin queries using Apache TinkerPop. It is not --> <!-- designed for production or commercial use. --> <!-- --> <!-- Significant effort has gone into designing this graph to resemble a realistic --> <!-- online learning ecosystem. The model captures relationships between learners, --> <!-- courses, instructors, topics, and skills in a way that is easy to reason about --> <!-- and query. --> <!-- --> <!-- As learning platforms evolve continuously, any static representation such as --> <!-- this one will always be incomplete or out of date. New courses are added, --> <!-- content changes, and learners progress over time. This graph intentionally --> <!-- freezes a small snapshot of such a system for instructional purposes. --> <!-- --> <!-- This graph does not attempt to model course pricing, certificates, payments, --> <!-- or detailed assessment data. Those would significantly increase the size and --> <!-- complexity of the graph and are left as an exercise for the reader. --> <!-- --> <!-- The graph has a deliberately simple and readable schema that makes it suitable --> <!-- for beginners while still being rich enough for intermediate Gremlin examples. --> <!-- --> <!-- There are five basic vertex types: --> <!-- 1. Student - Represents a learner using the platform --> <!-- 2. Instructor - Represents a course author or teacher --> <!-- 3. Course - Represents a learning course --> <!-- 4. Topic - Represents a subject area covered by a course --> <!-- 5. Skill - Represents a skill acquired by completing courses --> <!-- --> <!-- There are six edge types: --> <!-- 1. enrolled_in - Connects Student -> Course with progress information --> <!-- 2. teaches - Connects Instructor -> Course --> <!-- 3. covers - Connects Course -> Topic --> <!-- 4. prerequisite - Connects Course -> Course --> <!-- 5. requires - Connects Course -> Skill --> <!-- 6. grants - Connects Course -> Skill --> <!-- --> <!-- This graph is intentionally small but expressive enough to support queries --> <!-- involving recommendations, prerequisites, learning paths, and skill analysis. --> <!-- --> <!-- ******************************************************************************* --> <graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd"> <!-- ======================= --> <!-- Vertex Property Keys --> <!-- ======================= --> <key id="labelV" for="node" attr.name="label" attr.type="string"/> <key id="type" for="node" attr.name="type" attr.type="string"/> <key id="name" for="node" attr.name="name" attr.type="string"/> <key id="level" for="node" attr.name="level" attr.type="string"/> <key id="duration" for="node" attr.name="duration" attr.type="int"/> <!-- ======================= --> <!-- Edge Property Keys --> <!-- ======================= --> <key id="labelE" for="edge" attr.name="label" attr.type="string"/> <key id="progress" for="edge" attr.name="progress" attr.type="int"/> <graph id="learning-platform-graph" edgedefault="directed"> <!-- ======================= --> <!-- Student Vertices --> <!-- ======================= --> <node id="1"> <data key="labelV">student</data> <data key="type">student</data> <data key="name">Anita</data> </node> <node id="2"> <data key="labelV">student</data> <data key="type">student</data> <data key="name">Rahul</data> </node> <node id="3"> <data key="labelV">student</data> <data key="type">student</data> <data key="name">Meera</data> </node> <!-- ======================= --> <!-- Instructor Vertices --> <!-- ======================= --> <node id="10"> <data key="labelV">instructor</data> <data key="type">instructor</data> <data key="name">Dr. Kumar</data> </node> <node id="11"> <data key="labelV">instructor</data> <data key="type">instructor</data> <data key="name">Sarah Lee</data> </node> <!-- ======================= --> <!-- Course Vertices --> <!-- ======================= --> <node id="20"> <data key="labelV">course</data> <data key="type">course</data> <data key="name">Gremlin Fundamentals</data> <data key="level">Beginner</data> <data key="duration">20</data> </node> <node id="21"> <data key="labelV">course</data> <data key="type">course</data> <data key="name">Graph Databases</data> <data key="level">Intermediate</data> <data key="duration">40</data> </node> <node id="22"> <data key="labelV">course</data> <data key="type">course</data> <data key="name">Advanced Graph Analytics</data> <data key="level">Advanced</data> <data key="duration">35</data> </node> <!-- ======================= --> <!-- Topic Vertices --> <!-- ======================= --> <node id="30"> <data key="labelV">topic</data> <data key="type">topic</data> <data key="name">Graph Traversals</data> </node> <node id="31"> <data key="labelV">topic</data> <data key="type">topic</data> <data key="name">Graph Modeling</data> </node> <node id="32"> <data key="labelV">topic</data> <data key="type">topic</data> <data key="name">Performance Optimization</data> </node> <!-- ======================= --> <!-- Skill Vertices --> <!-- ======================= --> <node id="40"> <data key="labelV">skill</data> <data key="type">skill</data> <data key="name">Gremlin Querying</data> </node> <node id="41"> <data key="labelV">skill</data> <data key="type">skill</data> <data key="name">Graph Design</data> </node> <!-- ======================= --> <!-- Enrollment Edges --> <!-- ======================= --> <edge id="e1" source="1" target="20"> <data key="labelE">enrolled_in</data> <data key="progress">100</data> </edge> <edge id="e2" source="1" target="21"> <data key="labelE">enrolled_in</data> <data key="progress">60</data> </edge> <edge id="e3" source="2" target="20"> <data key="labelE">enrolled_in</data> <data key="progress">80</data> </edge> <edge id="e4" source="3" target="21"> <data key="labelE">enrolled_in</data> <data key="progress">40</data> </edge> <!-- ======================= --> <!-- Teaching Relationships --> <!-- ======================= --> <edge id="e5" source="10" target="20"> <data key="labelE">teaches</data> </edge> <edge id="e6" source="11" target="21"> <data key="labelE">teaches</data> </edge> <edge id="e7" source="11" target="22"> <data key="labelE">teaches</data> </edge> <!-- ======================= --> <!-- Course Structure --> <!-- ======================= --> <edge id="e8" source="20" target="30"> <data key="labelE">covers</data> </edge> <edge id="e9" source="21" target="31"> <data key="labelE">covers</data> </edge> <edge id="e10" source="22" target="32"> <data key="labelE">covers</data> </edge> <!-- ======================= --> <!-- Prerequisites --> <!-- ======================= --> <edge id="e11" source="20" target="21"> <data key="labelE">prerequisite</data> </edge> <edge id="e12" source="21" target="22"> <data key="labelE">prerequisite</data> </edge> <!-- ======================= --> <!-- Skills --> <!-- ======================= --> <edge id="e13" source="20" target="40"> <data key="labelE">grants</data> </edge> <edge id="e14" source="21" target="41"> <data key="labelE">grants</data> </edge> <edge id="e15" source="22" target="41"> <data key="labelE">requires</data> </edge> </graph> </graphml>
1. Graph Overview and Schema
The learning platform graph contains several distinct vertex types, each identified using labels. The primary vertex labels used are:
· student: represents a learner on the platform
· instructor: represents a course author or teacher
· course: represents a learning course
· topic: represents a subject area covered by a course
· skill: represents a skill that can be gained through learning
Relationships between these vertices are modeled using directed edges, each with a clear semantic meaning. The graph uses the following edge labels:
· enrolled_in: connects a student to a course, with a progress property
· teaches: connects an instructor to a course
· covers: connects a course to the topics it includes
· prerequisite: connects one course to another that depends on it
· grants: connects a course to the skills it provides
· requires: connects a course to prerequisite skills
This deliberately simple schema makes the graph easy to reason about while still supporting interesting traversal patterns such as prerequisite analysis, learning path discovery, and skill gap identification.
Vertex Properties
Each vertex in the graph has a unique identifier and a label, along with a small set of properties appropriate to its type. For example, course vertices include metadata such as difficulty level and duration.
A typical course vertex contains the following properties:
· type (string): vertex type, always course
· name (string): course title
· level (string): difficulty level (Beginner, Intermediate, Advanced)
· duration (int): estimated duration in hours
Example
<node id="20"> <data key="labelV">course</data> <data key="type">course</data> <data key="name">Gremlin Fundamentals</data> <data key="level">Beginner</data> <data key="duration">20</data> </node>
Student, instructor, topic, and skill vertices use similarly concise and readable property sets.
Load learning-platform.graphml content to a Graph
Step 1: Open Gremlin console.
$gremlin.sh \,,,/ (o o) -----oOOo-(3)-oOOo----- plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities plugin activated: tinkerpop.tinkergraph gremlin>
Step 2: Get the TinkerGraph object.
graph = TinkerGraph.open()
gremlin> graph = TinkerGraph.open() ==>tinkergraph[vertices:0 edges:0]
Step 3: Load the content from graphml file.
graph.io(graphml()).readGraph('/Users/Shared/graph/learningPlatform.graphml')
gremlin> graph.io(graphml()).readGraph('/Users/Shared/graph/learning-platform.graphml') ==>null
Get the traversal object.
g = graph.traversal()
gremlin> g = graph.traversal() ==>graphtraversalsource[tinkergraph[vertices:13 edges:15], standard]
2. Inspecting the Graph with Gremlin
Example 1: Count the total number of vertices and edges.
gremlin> g.V().count() ==>13 gremlin> gremlin> g.E().count() ==>15
Example 2: List all vertex labels used in the graph.
gremlin> g.V().label() ==>instructor ==>course ==>student ==>student ==>student ==>skill ==>topic ==>skill ==>course ==>topic ==>instructor ==>course ==>topic
As you observe that the labels are repeated, we can apply dedup method to remove duplicates.
gremlin> g.V().label().dedup() ==>instructor ==>course ==>student ==>skill ==>topic
Example 3: Inspecting a Single Vertex
To inspect the properties of a specific vertex, we can filter by a known property such as name. For example, to retrieve all properties of the Gremlin Fundamentals course.
g.V().has('name', 'Gremlin Fundamentals').valueMap(true).unfold()
gremlin> g.V().has('name', 'Gremlin Fundamentals').valueMap(true).unfold() ==>id=20 ==>label=course ==>duration=[20] ==>level=[Beginner] ==>name=[Gremlin Fundamentals] ==>type=[course]
The output includes the vertex identifier, label, and all associated properties. As noted earlier, property values are returned as lists, even when a property has only a single value. This reflects TinkerPop’s support for multi-valued properties.
Example 4: find all courses a specific student is enrolled in
g.V(). hasLabel('student'). has('name', 'Anita'). out('enrolled_in'). values('name')
What it does?
· Start from all vertices
· Filter to vertices with label student
· Further filter to name = 'Anita'
· Traverse outgoing enrolled_in edges
· Return the name of connected vertices (courses)
gremlin> g.V(). ......1> hasLabel('student'). ......2> has('name', 'Anita'). ......3> out('enrolled_in'). ......4> values('name') ==>Gremlin Fundamentals ==>Graph Databases
We can write the same one using has method as well like below.
g.V(). has('student', 'name', 'Anita') .out('enrolled_in') .values('name')
has(label, propertyKey, propertyValue) is equivalant to hasLabel('student').has('name', 'Anita').
Example 5: Get the see the student’s progress in each enrolled course
g.V().has('student', 'name', 'Anita'). outE('enrolled_in'). project('course','progress'). by(inV().values('name')). by(values('progress'))
gremlin> g.V().has('student', 'name', 'Anita'). ......1> outE('enrolled_in'). ......2> project('course','progress'). ......3> by(inV().values('name')). ......4> by(values('progress')) ==>[course:Gremlin Fundamentals,progress:100] ==>[course:Graph Databases,progress:60]
Example 6: List all topics covered by a given course.
g.V().hasLabel('course').has('name', 'Graph Databases'). out('covers'). values('name')
gremlin> g.V().hasLabel('course').has('name', 'Graph Databases'). ......1> out('covers'). ......2> values('name') ==>Graph Modeling
Example 7: Find the immediate prerequisites for Advanced Graph Analytics.
g.V().has('name', 'Advanced Graph Analytics'). in('prerequisite'). values('name')
gremlin> g.V().has('name', 'Advanced Graph Analytics'). ......1> in('prerequisite'). ......2> values('name') ==>Graph Databases
To retrieve the full prerequisite chain (up to two levels deep):
g.V().has('name', 'Advanced Graph Analytics'). repeat(in('prerequisite')). times(2). emit(). values('name')
gremlin> g.V().has('name', 'Advanced Graph Analytics'). ......1> repeat(in('prerequisite')). ......2> times(2). ......3> emit(). ......4> values('name') ==>Graph Databases ==>Gremlin Fundamentals
Example 8: Find all skills a student can acquire based on their current enrollments.
gremlin> g.V().hasLabel('student').has('name', 'Anita'). ......1> out('enrolled_in'). ......2> out('grants'). ......3> dedup(). ......4> values('name') ==>Gremlin Querying ==>Graph Design
Graph Size and Purpose
By design, the learning platform graph is small. It contains only a handful of vertices and edges, making it easy to load, visualize, and understand. Despite its size, the graph is sufficiently expressive to support a wide range of Gremlin examples, including:
Finding all courses a student is enrolled in
· Traversing prerequisite chains
· Identifying which skills a learner can acquire next
· Exploring instructor teaching relationships
· Computing simple statistics such as enrollment counts
While the dataset is static, real learning platforms are constantly evolving. New courses are added, content changes, and learners progress over time. A static graph like this one will always represent a snapshot, but that is precisely what makes it ideal for repeatable demonstrations and tutorials.
As a learning tool, this graph provides a compact but realistic foundation for experimenting with Gremlin traversals, understanding graph schemas, and visualizing relationships. It is intentionally simple, but there is plenty of room to extend it by adding assessments, certifications, organizations, or recommendation logic.
I hope you find this graph as enjoyable to work with as it was to design, and that it serves as a useful companion as you continue exploring Apache TinkerPop and Gremlin.
No comments:
Post a Comment