Sunday, 5 June 2022

Apache Atlas: core built-in types

 

In this post, I am going to explain core built-in types of Apache Atlas.

 

a.   Referenceable

b.   Asset

c.    Infrastructure

d.   DataSet

e.   Process

 


 

Referenceable

This type represents all entities that can be searched for using a unique attribute called qualifiedName.

 

Type definition of Referenceable

{
  "category": "ENTITY",
  "guid": "a4bb02ae-6881-4e1f-8735-0d88478515bd",
  "createdBy": "krishna",
  "updatedBy": "krishna",
  "createTime": 1643689899136,
  "updateTime": 1643689913446,
  "version": 4,
  "name": "Referenceable",
  "description": "Referenceable",
  "typeVersion": "1.3",
  "serviceType": "atlas_core",
  "attributeDefs": [
    {
      "name": "qualifiedName",
      "typeName": "string",
      "isOptional": false,
      "cardinality": "SINGLE",
      "valuesMinCount": 1,
      "valuesMaxCount": 1,
      "isUnique": true,
      "isIndexable": true,
      "includeInNotification": false,
      "searchWeight": 10
    },
    {
      "name": "replicatedFrom",
      "typeName": "array<AtlasServer>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": 0,
      "valuesMaxCount": 2147483647,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "options": {
        "isSoftReference": "true"
      }
    },
    {
      "name": "replicatedTo",
      "typeName": "array<AtlasServer>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": 0,
      "valuesMaxCount": 2147483647,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "options": {
        "isSoftReference": "true"
      }
    }
  ],
  "superTypes": [],
  "subTypes": [
    "spark_storagedesc",
    "hive_storagedesc",
    "Asset",
    "ddl"
  ],
  "relationshipAttributeDefs": [
    {
      "name": "meanings",
      "typeName": "array<AtlasGlossaryTerm>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "AtlasGlossarySemanticAssignment",
      "isLegacyAttribute": false
    }
  ],
  "businessAttributeDefs": {}
}

 

Asset

Asset is a subtype of Referenceable, and adds attributes like name, description and owner. Here name is a mandatory attribute.

 

Type definition of Asset

{
  "category": "ENTITY",
  "guid": "89d41a70-03d7-402b-8c0e-b6c3153156a7",
  "createdBy": "krishna",
  "updatedBy": "krishna",
  "createTime": 1643689899162,
  "updateTime": 1643689915963,
  "version": 6,
  "name": "Asset",
  "description": "Asset",
  "typeVersion": "1.6",
  "serviceType": "atlas_core",
  "attributeDefs": [
    {
      "name": "name",
      "typeName": "string",
      "isOptional": false,
      "cardinality": "SINGLE",
      "valuesMinCount": 1,
      "valuesMaxCount": 1,
      "isUnique": false,
      "isIndexable": true,
      "includeInNotification": false,
      "searchWeight": 10,
      "indexType": "STRING"
    },
    {
      "name": "description",
      "typeName": "string",
      "isOptional": true,
      "cardinality": "SINGLE",
      "valuesMinCount": 0,
      "valuesMaxCount": 1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": 9
    },
    {
      "name": "owner",
      "typeName": "string",
      "isOptional": true,
      "cardinality": "SINGLE",
      "valuesMinCount": 0,
      "valuesMaxCount": 1,
      "isUnique": false,
      "isIndexable": true,
      "includeInNotification": false,
      "searchWeight": 9,
      "indexType": "STRING"
    },
    {
      "name": "displayName",
      "typeName": "string",
      "isOptional": true,
      "cardinality": "SINGLE",
      "valuesMinCount": 0,
      "valuesMaxCount": 1,
      "isUnique": false,
      "isIndexable": true,
      "includeInNotification": false,
      "searchWeight": -1,
      "indexType": "STRING"
    },
    {
      "name": "userDescription",
      "typeName": "string",
      "isOptional": true,
      "cardinality": "SINGLE",
      "valuesMinCount": 0,
      "valuesMaxCount": 1,
      "isUnique": false,
      "isIndexable": true,
      "includeInNotification": false,
      "searchWeight": -1,
      "indexType": "STRING"
    }
  ],
  "superTypes": [
    "Referenceable"
  ],
  "subTypes": [
    "DataSet",
    "ProcessExecution",
    "adls_gen2_account",
    "Infrastructure",
    "Process",
    "hive_db",
    "hbase_namespace"
  ],
  "relationshipAttributeDefs": [
    {
      "name": "meanings",
      "typeName": "array<AtlasGlossaryTerm>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "AtlasGlossarySemanticAssignment",
      "isLegacyAttribute": false
    }
  ],
  "businessAttributeDefs": {}
}

 

Infrastructure

This is a subtype of Asset, can be used to be a common super type for infrastructural metadata objects like clusters, hosts etc.

 

Type definition of Infrastructure

 

{
  "category": "ENTITY",
  "guid": "51a0384f-cf51-45d8-a401-23b3da61ceca",
  "createdBy": "krishna",
  "updatedBy": "krishna",
  "createTime": 1643689899193,
  "updateTime": 1643689906850,
  "version": 2,
  "name": "Infrastructure",
  "description": "Infrastructure can be IT infrastructure, which contains hosts and servers. Infrastructure might not be IT orientated, such as 'Car' for IoT applications.",
  "typeVersion": "1.2",
  "serviceType": "atlas_core",
  "attributeDefs": [],
  "superTypes": [
    "Asset"
  ],
  "subTypes": [
    "falcon_cluster"
  ],
  "relationshipAttributeDefs": [
    {
      "name": "meanings",
      "typeName": "array<AtlasGlossaryTerm>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "AtlasGlossarySemanticAssignment",
      "isLegacyAttribute": false
    }
  ],
  "businessAttributeDefs": {}
}

 

DataSet

DataSet extends Referenceable. It is used to represent a type that can store data. DataSet entities can participate in data lineage.

 

DataSet definition

 

{
  "category": "ENTITY",
  "guid": "5d312f85-95b2-40c8-aaa3-b42643d950f0",
  "createdBy": "krishna",
  "updatedBy": "krishna",
  "createTime": 1643689899189,
  "updateTime": 1643689951742,
  "version": 3,
  "name": "DataSet",
  "description": "DataSet",
  "typeVersion": "1.2",
  "serviceType": "atlas_core",
  "attributeDefs": [],
  "superTypes": [
    "Asset"
  ],
  "subTypes": [
    "adls_gen2_container",
    "rdbms_foreign_key",
    "gcp_storage_base",
    "spark_ml_directory",
    "ozone_volume",
    "hive_table",
    "spark_column",
    "aws_s3_pseudo_dir",
    "sqoop_dbdatastore",
    "hbase_column",
    "rdbms_instance",
    "spark_table",
    "falcon_feed",
    "jms_topic",
    "hbase_table",
    "rdbms_table",
    "rdbms_column",
    "hbase_column_family",
    "hive_column",
    "Path",
    "rdbms_db",
    "ml_model_deployment",
    "spark_ml_pipeline",
    "kafka_topic",
    "ozone_bucket",
    "adls_gen2_blob",
    "spark_ml_model",
    "aws_s3_v2_base",
    "adls_gen2_directory",
    "rdbms_index",
    "ml_project",
    "ozone_key",
    "avro_type",
    "aws_s3_object",
    "aws_s3_bucket",
    "ml_model_build",
    "fs_path",
    "spark_db"
  ],
  "relationshipAttributeDefs": [
    {
      "name": "inputToProcesses",
      "typeName": "array<Process>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "dataset_process_inputs",
      "isLegacyAttribute": false
    },
    {
      "name": "pipeline",
      "typeName": "spark_ml_pipeline",
      "isOptional": true,
      "cardinality": "SINGLE",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "spark_ml_pipeline_dataset",
      "isLegacyAttribute": false
    },
    {
      "name": "schema",
      "typeName": "array<avro_schema>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "avro_schema_associatedEntities",
      "isLegacyAttribute": false
    },
    {
      "name": "model",
      "typeName": "spark_ml_model",
      "isOptional": true,
      "cardinality": "SINGLE",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "spark_ml_model_dataset",
      "isLegacyAttribute": false
    },
    {
      "name": "meanings",
      "typeName": "array<AtlasGlossaryTerm>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "AtlasGlossarySemanticAssignment",
      "isLegacyAttribute": false
    },
    {
      "name": "outputFromProcesses",
      "typeName": "array<Process>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "process_dataset_outputs",
      "isLegacyAttribute": false
    }
  ],
  "businessAttributeDefs": {}
}

 

Process

Process type extends Asset. It is used to represent any data transformation operation. A Process type has two specific attributes, inputs and outputs, which are used to define data lineage. Both inputs and outputs are arrays of DataSet entities.

 

Process type definition

{
  "category": "ENTITY",
  "guid": "35674ebd-aaba-427c-9fc7-04860c258503",
  "createdBy": "krishna",
  "updatedBy": "krishna",
  "createTime": 1643689899877,
  "updateTime": 1643689907092,
  "version": 2,
  "name": "Process",
  "description": "Process",
  "typeVersion": "1.2",
  "serviceType": "atlas_core",
  "attributeDefs": [
    {
      "name": "inputs",
      "typeName": "array<DataSet>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": 0,
      "valuesMaxCount": 2147483647,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1
    },
    {
      "name": "outputs",
      "typeName": "array<DataSet>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": 0,
      "valuesMaxCount": 2147483647,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1
    }
  ],
  "superTypes": [
    "Asset"
  ],
  "subTypes": [
    "falcon_feed_replication",
    "falcon_process",
    "spark_column_lineage",
    "falcon_feed_creation",
    "flink_application",
    "flink_process",
    "ml_model_train_build_process",
    "spark_process",
    "ml_model_deploy_process",
    "ml_project_create_process",
    "hive_process",
    "impala_process",
    "impala_column_lineage",
    "spark_application",
    "sqoop_process",
    "hive_column_lineage",
    "storm_topology"
  ],
  "relationshipAttributeDefs": [
    {
      "name": "outputs",
      "typeName": "array<DataSet>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": 0,
      "valuesMaxCount": 2147483647,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "process_dataset_outputs",
      "isLegacyAttribute": true
    },
    {
      "name": "inputs",
      "typeName": "array<DataSet>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": 0,
      "valuesMaxCount": 2147483647,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "dataset_process_inputs",
      "isLegacyAttribute": true
    },
    {
      "name": "meanings",
      "typeName": "array<AtlasGlossaryTerm>",
      "isOptional": true,
      "cardinality": "SET",
      "valuesMinCount": -1,
      "valuesMaxCount": -1,
      "isUnique": false,
      "isIndexable": false,
      "includeInNotification": false,
      "searchWeight": -1,
      "relationshipTypeName": "AtlasGlossarySemanticAssignment",
      "isLegacyAttribute": false
    }
  ],
  "businessAttributeDefs": {}
}

 

Command to get type definition of a type

curl -X GET -u admin:admin -H 'accept: application/json'  -H 'cache-control: no-cache'  -H 'content-type: application/json'  'http://localhost:21000/api/atlas/v2/types/typedef/name/{TYPE_NAME}'

 


 

Previous                                                    Next                                                    Home

No comments:

Post a Comment