Programming for beginners: Apache Atlas: Define types, relationships and entities

In this post, I am going to explain how to define types, relationships and entities. For the simplicity purpose, I would like to define 3 entity types that represent RDBMS database, table and column, 2 relationship types to represent the relationship between db and table, table_column

a. jdbc_db: Represent a RDBMS database

b. jdbc_table: Represent a RDBMS table

c. jdbc_column: Represent a RDBMS column

d. jdbc_db_to_table: Represent the relationship between database and table. We can create tables and database independently, we can make this relation category as aggregation

e. jdbc_table_to_column: Represent the relationship between table and column. Ideally column should be created as part of table, and not alone, so we can make this relationship category as composition.

Define types

Define jdbc_db type

jdbcDb.json

{
  "entityDefs": [
    {
      "category": "ENTITY",
      "name": "jdbc_db",
      "description": "Used to model RDBMS database",
      "superTypes": [
        "DataSet"
      ],
      "subTypes": [],
      "relationshipAttributeDefs": [],
      "businessAttributeDefs": {},
      "attributeDefs": [
        {
          "name": "locationUri",
          "typeName": "string",
          "isOptional": true,
          "cardinality": "SINGLE",
          "valuesMinCount": 0,
          "valuesMaxCount": 1,
          "isUnique": false,
          "isIndexable": false,
          "includeInNotification": false,
          "searchWeight": -1
        },
        {
          "name": "ownedBy",
          "typeName": "string",
          "isOptional": true,
          "cardinality": "SINGLE",
          "valuesMinCount": 0,
          "valuesMaxCount": 1,
          "isUnique": false,
          "isIndexable": false,
          "includeInNotification": false,
          "searchWeight": -1
        },
        {
          "name": "createdTime",
          "typeName": "long",
          "isOptional": true,
          "cardinality": "SINGLE",
          "valuesMinCount": 0,
          "valuesMaxCount": 1,
          "isUnique": false,
          "isIndexable": false,
          "includeInNotification": false,
          "searchWeight": -1
        }
      ]
    }
  ]
}

As you see above definition,

a. jdbc_db is the type name

b. jdbc_db type inherit the properties from DataSet. Properties like name, description will be inherited from DataSet.

c. I added three specific attributes to the database like locationUri (to represent full url to connect to database), ownedBy (who own this), createdTime (at what time the data is onboarded to atlas).

Execute below command to insert the type jdbc_db.

curl -X POST -u admin:admin -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' http://localhost:21000/api/atlas/v2/types/typedefs -d @jdbcDb.json

$ curl -X POST -u admin:admin -H 'accept: application/json'  -H 'cache-control: no-cache'  -H 'content-type: application/json'  http://localhost:21000/api/atlas/v2/types/typedefs -d @jdbcDb.json
{"enumDefs":[],"structDefs":[],"classificationDefs":[],"entityDefs":[{"category":"ENTITY","guid":"2c8c5d12-a8ab-42eb-a821-482843205686","createdBy":"admin","updatedBy":"admin","createTime":1643361213143,"updateTime":1643361213143,"version":1,"name":"jdbc_db","description":"Used to model RDBMS database","typeVersion":"1.0","attributeDefs":[{"name":"locationUri","typeName":"string","isOptional":true,"cardinality":"SINGLE","valuesMinCount":0,"valuesMaxCount":1,"isUnique":false,"isIndexable":false,"includeInNotification":false,"searchWeight":-1},{"name":"ownedBy","typeName":"string","isOptional":true,"cardinality":"SINGLE","valuesMinCount":0,"valuesMaxCount":1,"isUnique":false,"isIndexable":false,"includeInNotification":false,"searchWeight":-1},{"name":"createdTime","typeName":"long","isOptional":true,"cardinality":"SINGLE","valuesMinCount":0,"valuesMaxCount":1,"isUnique":false,"isIndexable":false,"includeInNotification":false,"searchWeight":-1}],"superTypes":["DataSet"],"subTypes":[],"relationshipAttributeDefs":[{"name":"inputToProcesses","typeName":"array<Process>","isOptional":true,"cardinality":"SET","valuesMinCount":-1,"valuesMaxCount":-1,"isUnique":false,"isIndexable":false,"includeInNotification":false,"searchWeight":-1,"relationshipTypeName":"dataset_process_inputs","isLegacyAttribute":false},{"name":"pipeline","typeName":"spark_ml_pipeline","isOptional":true,"cardinality":"SINGLE","valuesMinCount":-1,"valuesMaxCount":-1,"isUnique":false,"isIndexable":false,"includeInNotification":false,"searchWeight":-1,"relationshipTypeName":"spark_ml_pipeline_dataset","isLegacyAttribute":false},{"name":"schema","typeName":"array<avro_schema>","isOptional":true,"cardinality":"SET","valuesMinCount":-1,"valuesMaxCount":-1,"isUnique":false,"isIndexable":false,"includeInNotification":false,"searchWeight":-1,"relationshipTypeName":"avro_schema_associatedEntities","isLegacyAttribute":false},{"name":"model","typeName":"spark_ml_model","isOptional":true,"cardinality":"SINGLE","valuesMinCount":-1,"valuesMaxCount":-1,"isUnique":false,"isIndexable":false,"includeInNotification":false,"searchWeight":-1,"relationshipTypeName":"spark_ml_model_dataset","isLegacyAttribute":false},{"name":"meanings","typeName":"array<AtlasGlossaryTerm>","isOptional":true,"cardinality":"SET","valuesMinCount":-1,"valuesMaxCount":-1,"isUnique":false,"isIndexable":false,"includeInNotification":false,"searchWeight":-1,"relationshipTypeName":"AtlasGlossarySemanticAssignment","isLegacyAttribute":false},{"name":"outputFromProcesses","typeName":"array<Process>","isOptional":true,"cardinality":"SET","valuesMinCount":-1,"valuesMaxCount":-1,"isUnique":false,"isIndexable":false,"includeInNotification":false,"searchWeight":-1,"relationshipTypeName":"process_dataset_outputs","isLegacyAttribute":false}],"businessAttributeDefs":{}}],"relationshipDefs":[],"businessMetadataDefs":[]}bash-3.2

Let’s get the type definition of jdbc_db and check.

Execute below command to get the type definition of jdbc_db.

curl -X GET -u admin:admin http://localhost:21000/api/atlas/v2/types/typedef/name/jdbc_db | jq

Type definition response looks like below.

{
   "category":"ENTITY",
   "guid":"2c8c5d12-a8ab-42eb-a821-482843205686",
   "createdBy":"admin",
   "updatedBy":"admin",
   "createTime":1643361213143,
   "updateTime":1643361213143,
   "version":1,
   "name":"jdbc_db",
   "description":"Used to model RDBMS database",
   "typeVersion":"1.0",
   "attributeDefs":[
      {
         "name":"locationUri",
         "typeName":"string",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      },
      {
         "name":"ownedBy",
         "typeName":"string",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      },
      {
         "name":"createdTime",
         "typeName":"long",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      }
   ],
   "superTypes":[
      "DataSet"
   ],
   "subTypes":[
      
   ],
   "relationshipAttributeDefs":[
      {
         "name":"inputToProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"dataset_process_inputs",
         "isLegacyAttribute":false
      },
      {
         "name":"pipeline",
         "typeName":"spark_ml_pipeline",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_pipeline_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"schema",
         "typeName":"array<avro_schema>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"avro_schema_associatedEntities",
         "isLegacyAttribute":false
      },
      {
         "name":"model",
         "typeName":"spark_ml_model",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_model_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"meanings",
         "typeName":"array<AtlasGlossaryTerm>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"AtlasGlossarySemanticAssignment",
         "isLegacyAttribute":false
      },
      {
         "name":"outputFromProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"process_dataset_outputs",
         "isLegacyAttribute":false
      }
   ],
   "businessAttributeDefs":{
      
   }
}

From the definition, you can observe

a. Unique id is assigned (guid) for this type.

b. You can observe some new relationshipAttributeDefs like inputToProcesses, pipeline, schema, model, meanings and outputFromProcesses are added. Actually we are not added these, but these are inherited from the super type DataSet. You can confirm the same by querying DataSet type definition (curl -X GET -u admin:admin http://localhost:21000/api/atlas/v2/types/typedef/name/DataSet).

You can confirm the same by reloading the AtlasUI.

Define jdbc_table type

jdbcTable.json

{
  "entityDefs": [
    {
      "category": "ENTITY",
      "name": "jdbc_table",
      "description": "Represent a RDBMS Table",
      "attributeDefs": [
        {
          "name": "ownedBy",
          "typeName": "string",
          "isOptional": true,
          "cardinality": "SINGLE",
          "valuesMinCount": 0,
          "valuesMaxCount": 1,
          "isUnique": false,
          "isIndexable": false,
          "includeInNotification": false,
          "searchWeight": -1
        },
        {
          "name": "createTime",
          "typeName": "long",
          "isOptional": true,
          "cardinality": "SINGLE",
          "valuesMinCount": 0,
          "valuesMaxCount": 1,
          "isUnique": false,
          "isIndexable": false,
          "includeInNotification": false,
          "searchWeight": -1
        },
        {
          "name": "lastAccessTime",
          "typeName": "long",
          "isOptional": true,
          "cardinality": "SINGLE",
          "valuesMinCount": 0,
          "valuesMaxCount": 1,
          "isUnique": false,
          "isIndexable": false,
          "includeInNotification": false,
          "searchWeight": -1
        }
      ],
      "superTypes": [
        "DataSet"
      ],
      "subTypes": [],
      "relationshipAttributeDefs": [],
      "businessAttributeDefs": {}
    }
  ]
}

Execute below command to create jdbc_table type.

curl -X POST -u admin:admin -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' http://localhost:21000/api/atlas/v2/types/typedefs -d @jdbcTable.json

Define jdbc_column type

jdbcColumn.json

{
  "entityDefs": [
    {
      "category": "ENTITY",
      "name": "jdbc_column",
      "description": "Represent a RDBMS Column",
      "attributeDefs": [
        {
          "name": "dataType",
          "typeName": "string",
          "isOptional": true,
          "cardinality": "SINGLE",
          "valuesMinCount": 0,
          "valuesMaxCount": 1,
          "isUnique": false,
          "isIndexable": false,
          "includeInNotification": false,
          "searchWeight": -1
        }
      ],
      "superTypes": [
        "DataSet"
      ],
      "subTypes": [],
      "relationshipAttributeDefs": [],
      "businessAttributeDefs": {}
    }
  ]
}

Execute below command to define jdbcColumn type.

curl -X POST -u admin:admin -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' http://localhost:21000/api/atlas/v2/types/typedefs -d @jdbcColumn.json

Reload the UI and confirm whether these types created successfully or not.

Define relationships

Different types of relationship category

a. Association: This is a relation with no containment

b. Composition: This is a relation with containment. In this case, children can’t exist without container. Modification of one object alters the data on other object. For example, column doesn’t make any sense independently. It should be part of table. So while creating, we should create table and column together.

c. Aggregation: This is a relation with containment. Here the lifecycle of container and its children is totally independent. Modification of one object does not alter the other object. For example, one car can have multiple wheels, and a car can’t move without a wheel, but the wheel can be independently used with scooter, or any other vehicle.

Define a relationship jdbc_db_to_table

Let’s define a relationship between the types ‘jdbc_db’ and ‘jdbc_table’. One database contain zero or more tables, so we can model this with relationship category aggregation.

jdbcDbToTable.json

{
  "relationshipDefs": [
    {
      "category": "RELATIONSHIP",
      "name": "jdbc_db_to_table",
      "description": "Specifies the relationship between RDBMS Table to RDBMS Database",
      "attributeDefs": [],
      "relationshipCategory": "AGGREGATION",
      "propagateTags": "NONE",
      "endDef1": {
        "type": "jdbc_table",
        "name": "db",
        "isContainer": false,
        "cardinality": "SINGLE",
        "isLegacyAttribute": false
      },
      "endDef2": {
        "type": "jdbc_db",
        "name": "tables",
        "isContainer": true,
        "cardinality": "SET",
        "isLegacyAttribute": false
      }
    }
  ]
}

Points to note here

a. Category property is set to "RELATIONSHIP".

b. relationshipCategory is set to AGGREGATION. Possible values are ASSOCIATION, AGGREGATION, COMPOSITION

c. endDef1, endDef2 are used to specify the types that are participating in this relation.

d. endDef1: Here we are specifying the type jdbc_table maintains a relationship property db which will point the database instance of type jdbc_db.

e. endDef2: Here we are specifying the type jdbc_table maintains a relationship property tables which will point the jdbc_table instances.

Do not worry if you do not understand much on this, you will understand the need of relationships once we create entities and show it in the UI.

Execute below command to define the relationship.

curl -X POST -u admin:admin -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' http://localhost:21000/api/atlas/v2/types/typedefs -d @jdbcDbToTable.json

Define a relationship jdbc_table_to_column

Let’s model this relationship category as COMPOSITION, that mean we need to define both the table and column together. Column doesn’t make any sense outside the table.

jdbcTableToColumn.json

{
  "relationshipDefs": [
    {
      "category": "RELATIONSHIP",
      "name": "jdbc_table_to_column",
      "description": "Specifies the relationship between a RDBMS Table to RDBMS Columns",
      "typeVersion": "1.0",
      "attributeDefs": [],
      "relationshipCategory": "COMPOSITION",
      "propagateTags": "NONE",
      "endDef1": {
        "type": "jdbc_table",
        "name": "columns",
        "isContainer": true,
        "cardinality": "SET",
        "isLegacyAttribute": false
      },
      "endDef2": {
        "type": "jdbc_column",
        "name": "table",
        "isContainer": false,
        "cardinality": "SINGLE",
        "isLegacyAttribute": false
      }
    }
  ]
}

Execute below command to onboard this relationship.

curl -X POST -u admin:admin -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' http://localhost:21000/api/atlas/v2/types/typedefs -d @jdbcTableToColumn.json

You can confirm all the types and relationships by executing below command.

curl -X GET -u admin:admin http://localhost:21000/api/atlas/v2/types/typedefs

Define entities

Now the types (jdbc_db, jdbc_table, jdbc_column, jdbc_db_to_table, jdbc_table_to_column) are created. Let’s define entities and establish the relationship between them.

Define an instance of jdbc_db

jdbcDbInstance.json

{
  "entity": {
    "typeName": "jdbc_db",
    "attributes": {
      "owner": "Krishna Gurram",
      "ownedBy": "Platforms team",
      "createTime": 1643363556,
      "qualifiedName": "abc.net:1234/warehouse",
      "displayName": "warehouse database",
      "name": "warehouse",
      "description": "warehouse database in MySQL"
    }
  }
}

Execute below command to define an instance of jdbc_db.

curl -X POST -u admin:admin -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' http://localhost:21000/api/atlas/v2/entity -d @jdbcDbInstance.json

Go to Atlas UI and confirm whether warehouse db is created or not.

Define an instance of jdbc_table

As you remember, we modelled the relationship between jdbc_table, jdbc_column as COMPOSITION, that means we need to define both table and columns together.

jdbcTableInstance.json

{
  "referredEntities": {
    "-2": {
      "guid": "-2",
      "typeName": "jdbc_column",
      "attributes": {
        "displayName": "product_sale_information",
        "name": "sales",
        "qualifiedName": "abc.net-warehouse-products-product-sales",
        "dataType": "string",
        "table": {
          "guid": "-1",
          "typeName": "jdbc_table"
        }
      }
    },
    "-3": {
      "guid": "-3",
      "typeName": "jdbc_column",
      "attributes": {
        "displayName": "product_category",
        "name": "product_category",
        "qualifiedName": "abc.net-warehouse-products-product-category",
        "dataType": "string",
        "table": {
          "guid": "-1",
          "typeName": "jdbc_table"
        }
      }
    },
    "-4": {
      "typeName": "jdbc_column",
      "guid": "-4",
      "attributes": {
        "displayName": "product_id",
        "name": "product_id",
        "qualifiedName": "abc.net-warehouse-products-product-id",
        "dataType": "int",
        "table": {
          "guid": "-1",
          "typeName": "jdbc_table"
        }
      }
    }
  },
  "entity": {
    "typeName": "jdbc_table",
    "guid": "-1",
    "attributes": {
      "owner": "Ram Majety",
      "qualifiedName": "abc.net-warehouse-products",
      "description": "Products MySQL Table",
      "name": "products",
      "columns": [
        {
          "typeName": "jdbc_column",
          "guid": "-2"
        },
        {
          "typeName": "jdbc_column",
          "guid": "-3"
        },
        {
          "typeName": "jdbc_column",
          "guid": "-4"
        }
      ]
    }
  }
}

Execute below command to define an instance of jdbc_table.

curl -X POST -u admin:admin -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' http://localhost:21000/api/atlas/v2/entity -d @jdbcTableInstance.json

$ curl -X POST -u admin:admin -H 'accept: application/json'  -H 'cache-control: no-cache'  -H 'content-type: application/json'  http://localhost:21000/api/atlas/v2/entity -d @jdbcTableInstance.json
{"guidAssignments":{"-1":"ccf28d22-f3e6-46a0-8de6-355f697da2cf","-2":"21c7b89b-31bb-4e7a-9483-3a8839dc5f09","-3":"57f8090f-7d50-4dfa-b0b4-4b51652db343","-4":"578c5c95-0df2-4610-93c1-9790c732f448"}}

Points to note here

a. Since we do not have guids upfront to establish the relationship between table and columns, I added some dummy guids -1, -2, -3, -4. When the request is submitted successfully, you can see the actual guids.

b. Property ‘referredEntities’ is used to establish the relationship between table and columns.

Let’s confirm how it looks like in UI.

Get all the entities of type ‘jdbc_table’.

Click on the table ‘products’ and go to Relationships tab.

You can see the relationship between table and columns. Click on the column product_id, navigate to Relationships tabl, You will see a relationship to products table.

Let’s establish the relationship between jdbc_db, jdbc_table

To establish the relationship between database and table, we need some property or properties to uniquely identify the database instance and table instance. Here I am using guids of table and instance.

How to get products table guid?

Navigate to products table, in the browser url, you will get guid. Apply same logic to get guid of warehouse table.

Products table guid: ccf28d22-f3e6-46a0-8de6-355f697da2cf

warehoused db guid: b11dd53c-1c5c-4bf7-9513-7962e7ffccd2

jdbcDbToTableRelation.json

{
  "typeName": "jdbc_db_to_table",
  "end1": {
    "guid": "ccf28d22-f3e6-46a0-8de6-355f697da2cf",
    "typeName": "jdbc_table"
  },
  "end2": {
    "guid": "514456a2-8a5a-49da-bb7d-b927433dd758",
    "typeName": "jdbc_db"
  },
  "label": "rel-between-db-table",
  "propagateTags": "NONE"
}

Execute below command to establish the relation.

curl -X POST -u admin:admin -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' http://localhost:21000/api/atlas/v2/relationship -d @jdbcDbToTableRelation.json

$ curl -X POST -u admin:admin -H 'accept: application/json'  -H 'cache-control: no-cache'  -H 'content-type: application/json'  http://localhost:21000/api/atlas/v2/relationship -d @jdbcDbToTableRelation.json
{"typeName":"jdbc_db_to_table","guid":"4feed41e-9368-4997-ab6c-a133d85d6a76","provenanceType":0,"end1":{"guid":"ccf28d22-f3e6-46a0-8de6-355f697da2cf","typeName":"jdbc_table","uniqueAttributes":{"qualifiedName":"abc.net-warehouse-products"}},"end2":{"guid":"b11dd53c-1c5c-4bf7-9513-7962e7ffccd2","typeName":"jdbc_db","uniqueAttributes":{"qualifiedName":"abc.net:1234/warehouse"}},"label":"r:jdbc_db_to_table","propagateTags":"NONE","status":"ACTIVE","createdBy":"admin","updatedBy":"admin","createTime":1643368886133,"updateTime":1643368886133,"version":0,"propagatedClassifications":[],"blockedPropagatedClassifications":[]}

Confirm the same from UI

Reload the UI and navigate to Relationships tab of warehouse database, you can see the table ‘products’ is related to warehouse database.

Click on products hyper link. You will see that ‘warehouse’ db is linked to products table….

Let’s query the type definitions and confirm that relationships are now coming as part of definition.

Type definition of jdbc_db

curl -X GET -u admin:admin http://localhost:21000/api/atlas/v2/types/typedef/name/jdbc_db

{
   "category":"ENTITY",
   "guid":"2c8c5d12-a8ab-42eb-a821-482843205686",
   "createdBy":"admin",
   "updatedBy":"admin",
   "createTime":1643361213143,
   "updateTime":1643361213143,
   "version":1,
   "name":"jdbc_db",
   "description":"Used to model RDBMS database",
   "typeVersion":"1.0",
   "attributeDefs":[
      {
         "name":"locationUri",
         "typeName":"string",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      },
      {
         "name":"ownedBy",
         "typeName":"string",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      },
      {
         "name":"createdTime",
         "typeName":"long",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      }
   ],
   "superTypes":[
      "DataSet"
   ],
   "subTypes":[
      
   ],
   "relationshipAttributeDefs":[
      {
         "name":"inputToProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"dataset_process_inputs",
         "isLegacyAttribute":false
      },
      {
         "name":"pipeline",
         "typeName":"spark_ml_pipeline",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_pipeline_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"schema",
         "typeName":"array<avro_schema>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"avro_schema_associatedEntities",
         "isLegacyAttribute":false
      },
      {
         "name":"tables",
         "typeName":"array<jdbc_table>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"jdbc_db_to_table",
         "isLegacyAttribute":false
      },
      {
         "name":"model",
         "typeName":"spark_ml_model",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_model_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"meanings",
         "typeName":"array<AtlasGlossaryTerm>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"AtlasGlossarySemanticAssignment",
         "isLegacyAttribute":false
      },
      {
         "name":"outputFromProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"process_dataset_outputs",
         "isLegacyAttribute":false
      }
   ],
   "businessAttributeDefs":{
      
   }
}

As you see the relation ‘tables’ is coming as part of relationshipAttributeDefs property.

jdbc_table definition

{
   "category":"ENTITY",
   "guid":"b1cee49b-6a10-48a1-ae67-202e1d2ed380",
   "createdBy":"admin",
   "updatedBy":"admin",
   "createTime":1643362051131,
   "updateTime":1643362051131,
   "version":1,
   "name":"jdbc_table",
   "description":"Represent a RDBMS Table",
   "typeVersion":"1.0",
   "attributeDefs":[
      {
         "name":"ownedBy",
         "typeName":"string",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      },
      {
         "name":"createTime",
         "typeName":"long",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      },
      {
         "name":"lastAccessTime",
         "typeName":"long",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      }
   ],
   "superTypes":[
      "DataSet"
   ],
   "subTypes":[
      
   ],
   "relationshipAttributeDefs":[
      {
         "name":"inputToProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"dataset_process_inputs",
         "isLegacyAttribute":false
      },
      {
         "name":"pipeline",
         "typeName":"spark_ml_pipeline",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_pipeline_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"schema",
         "typeName":"array<avro_schema>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"avro_schema_associatedEntities",
         "isLegacyAttribute":false
      },
      {
         "name":"columns",
         "typeName":"array<jdbc_column>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "constraints":[
            {
               "type":"ownedRef"
            }
         ],
         "relationshipTypeName":"jdbc_table_to_column",
         "isLegacyAttribute":false
      },
      {
         "name":"model",
         "typeName":"spark_ml_model",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_model_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"meanings",
         "typeName":"array<AtlasGlossaryTerm>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"AtlasGlossarySemanticAssignment",
         "isLegacyAttribute":false
      },
      {
         "name":"db",
         "typeName":"jdbc_db",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"jdbc_db_to_table",
         "isLegacyAttribute":false
      },
      {
         "name":"outputFromProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"process_dataset_outputs",
         "isLegacyAttribute":false
      }
   ],
   "businessAttributeDefs":{
      
   }
}

As you see db and columns are coming as part of ‘relationshipAttributeDefs’.

Definition of jdbc_column

{
   "category":"ENTITY",
   "guid":"4f80e340-cd53-4084-af46-e2c0f07e256e",
   "createdBy":"admin",
   "updatedBy":"admin",
   "createTime":1643362200198,
   "updateTime":1643362200198,
   "version":1,
   "name":"jdbc_column",
   "description":"Represent a RDBMS Column",
   "typeVersion":"1.0",
   "attributeDefs":[
      {
         "name":"dataType",
         "typeName":"string",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      }
   ],
   "superTypes":[
      "DataSet"
   ],
   "subTypes":[
      
   ],
   "relationshipAttributeDefs":[
      {
         "name":"inputToProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"dataset_process_inputs",
         "isLegacyAttribute":false
      },
      {
         "name":"pipeline",
         "typeName":"spark_ml_pipeline",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_pipeline_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"schema",
         "typeName":"array<avro_schema>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"avro_schema_associatedEntities",
         "isLegacyAttribute":false
      },
      {
         "name":"model",
         "typeName":"spark_ml_model",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_model_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"meanings",
         "typeName":"array<AtlasGlossaryTerm>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"AtlasGlossarySemanticAssignment",
         "isLegacyAttribute":false
      },
      {
         "name":"table",
         "typeName":"jdbc_table",
         "isOptional":false,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"jdbc_table_to_column",
         "isLegacyAttribute":false
      },
      {
         "name":"outputFromProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"process_dataset_outputs",
         "isLegacyAttribute":false
      }
   ],
   "businessAttributeDefs":{
      
   }
}

As you see ‘table’ is coming as part of relationshipAttributeDefs.

That’s it you are done with the demo. You can download all the json files from this link.

Previous Next Home

Programming for beginners

Monday, 6 June 2022

Apache Atlas: Define types, relationships and entities

No comments:

Post a Comment