Friday, 3 June 2022

Atlas: Type system

Atlas type system allows you to define a model for metadata objects that you want to manage. The model is composed of types, entities (instance of type is called entity, maintain actual metadata), relationships between types etc.,

 

What is a Type?

Type is a collection of one or more attributes that is used to define properties of metadata object. If you are from object oriented programming background, type is similar to a class definition.

 

Every type has a definition associated with it. For example, find the below type definition.

 

Example of a type definition

{
   "category":"ENTITY",
   "guid":"4f80e340-cd53-4084-af46-e2c0f07e256e",
   "createdBy":"admin",
   "updatedBy":"admin",
   "createTime":1643362200198,
   "updateTime":1643362200198,
   "version":1,
   "name":"jdbc_column",
   "description":"Represent a RDBMS Column",
   "typeVersion":"1.0",
   "attributeDefs":[
      {
         "name":"dataType",
         "typeName":"string",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":0,
         "valuesMaxCount":1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1
      }
   ],
   "superTypes":[
      "DataSet"
   ],
   "subTypes":[
      
   ],
   "relationshipAttributeDefs":[
      {
         "name":"inputToProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"dataset_process_inputs",
         "isLegacyAttribute":false
      },
      {
         "name":"pipeline",
         "typeName":"spark_ml_pipeline",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_pipeline_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"schema",
         "typeName":"array<avro_schema>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"avro_schema_associatedEntities",
         "isLegacyAttribute":false
      },
      {
         "name":"model",
         "typeName":"spark_ml_model",
         "isOptional":true,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"spark_ml_model_dataset",
         "isLegacyAttribute":false
      },
      {
         "name":"meanings",
         "typeName":"array<AtlasGlossaryTerm>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"AtlasGlossarySemanticAssignment",
         "isLegacyAttribute":false
      },
      {
         "name":"table",
         "typeName":"jdbc_table",
         "isOptional":false,
         "cardinality":"SINGLE",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"jdbc_table_to_column",
         "isLegacyAttribute":false
      },
      {
         "name":"outputFromProcesses",
         "typeName":"array<Process>",
         "isOptional":true,
         "cardinality":"SET",
         "valuesMinCount":-1,
         "valuesMaxCount":-1,
         "isUnique":false,
         "isIndexable":false,
         "includeInNotification":false,
         "searchWeight":-1,
         "relationshipTypeName":"process_dataset_outputs",
         "isLegacyAttribute":false
      }
   ],
   "businessAttributeDefs":{
      
   }
}

 

Following are the key points to be noted while working with Atlas type system.

 

a. Every type is uniquely identified by a name in Atlas.

 

b. Every type must specify a type category at the time of creation. Type category can be

1.    primitive like boolean, byte, short, int, long, float, double,biginteger, bigdecimal, string, date.

2.    Enum

3.    Collection like array, map

4.     A composite like Entity, Struct, Classification, Relationship

 

Can these types inherit from other types?

Yes, only types of category Entity and classification can extend from other types. It is similar to inheritance in object oriented programming.

 

Only Entity and classification types can extend from other types.

 

Can a type inherit from more than one super types?

Yes, it is absolutely possible in Atlas.

 

Can I refer one type in the definition of other type?

Yes, you can do. For example, in the above definition we are referring AtlasGlossaryTerm, spark_ml_model in the above definition.

 

How to access a property of type?

Property can be accessed using the expression ‘type_name.attribute_name’. In the above example, ‘jdbc_column. dataType’ property is of type String.

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment