In this post, I am going to explain core built-in types of Apache Atlas.
a. Referenceable
b. Asset
c. Infrastructure
d. DataSet
e. Process
Referenceable
This type represents all entities that can be searched for using a unique attribute called qualifiedName.
Type definition of Referenceable
{
"category": "ENTITY",
"guid": "a4bb02ae-6881-4e1f-8735-0d88478515bd",
"createdBy": "krishna",
"updatedBy": "krishna",
"createTime": 1643689899136,
"updateTime": 1643689913446,
"version": 4,
"name": "Referenceable",
"description": "Referenceable",
"typeVersion": "1.3",
"serviceType": "atlas_core",
"attributeDefs": [
{
"name": "qualifiedName",
"typeName": "string",
"isOptional": false,
"cardinality": "SINGLE",
"valuesMinCount": 1,
"valuesMaxCount": 1,
"isUnique": true,
"isIndexable": true,
"includeInNotification": false,
"searchWeight": 10
},
{
"name": "replicatedFrom",
"typeName": "array<AtlasServer>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": 0,
"valuesMaxCount": 2147483647,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"options": {
"isSoftReference": "true"
}
},
{
"name": "replicatedTo",
"typeName": "array<AtlasServer>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": 0,
"valuesMaxCount": 2147483647,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"options": {
"isSoftReference": "true"
}
}
],
"superTypes": [],
"subTypes": [
"spark_storagedesc",
"hive_storagedesc",
"Asset",
"ddl"
],
"relationshipAttributeDefs": [
{
"name": "meanings",
"typeName": "array<AtlasGlossaryTerm>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "AtlasGlossarySemanticAssignment",
"isLegacyAttribute": false
}
],
"businessAttributeDefs": {}
}
Asset
Asset is a subtype of Referenceable, and adds attributes like name, description and owner. Here name is a mandatory attribute.
Type definition of Asset
{
"category": "ENTITY",
"guid": "89d41a70-03d7-402b-8c0e-b6c3153156a7",
"createdBy": "krishna",
"updatedBy": "krishna",
"createTime": 1643689899162,
"updateTime": 1643689915963,
"version": 6,
"name": "Asset",
"description": "Asset",
"typeVersion": "1.6",
"serviceType": "atlas_core",
"attributeDefs": [
{
"name": "name",
"typeName": "string",
"isOptional": false,
"cardinality": "SINGLE",
"valuesMinCount": 1,
"valuesMaxCount": 1,
"isUnique": false,
"isIndexable": true,
"includeInNotification": false,
"searchWeight": 10,
"indexType": "STRING"
},
{
"name": "description",
"typeName": "string",
"isOptional": true,
"cardinality": "SINGLE",
"valuesMinCount": 0,
"valuesMaxCount": 1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": 9
},
{
"name": "owner",
"typeName": "string",
"isOptional": true,
"cardinality": "SINGLE",
"valuesMinCount": 0,
"valuesMaxCount": 1,
"isUnique": false,
"isIndexable": true,
"includeInNotification": false,
"searchWeight": 9,
"indexType": "STRING"
},
{
"name": "displayName",
"typeName": "string",
"isOptional": true,
"cardinality": "SINGLE",
"valuesMinCount": 0,
"valuesMaxCount": 1,
"isUnique": false,
"isIndexable": true,
"includeInNotification": false,
"searchWeight": -1,
"indexType": "STRING"
},
{
"name": "userDescription",
"typeName": "string",
"isOptional": true,
"cardinality": "SINGLE",
"valuesMinCount": 0,
"valuesMaxCount": 1,
"isUnique": false,
"isIndexable": true,
"includeInNotification": false,
"searchWeight": -1,
"indexType": "STRING"
}
],
"superTypes": [
"Referenceable"
],
"subTypes": [
"DataSet",
"ProcessExecution",
"adls_gen2_account",
"Infrastructure",
"Process",
"hive_db",
"hbase_namespace"
],
"relationshipAttributeDefs": [
{
"name": "meanings",
"typeName": "array<AtlasGlossaryTerm>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "AtlasGlossarySemanticAssignment",
"isLegacyAttribute": false
}
],
"businessAttributeDefs": {}
}
Infrastructure
This is a subtype of Asset, can be used to be a common super type for infrastructural metadata objects like clusters, hosts etc.
Type definition of Infrastructure
{
"category": "ENTITY",
"guid": "51a0384f-cf51-45d8-a401-23b3da61ceca",
"createdBy": "krishna",
"updatedBy": "krishna",
"createTime": 1643689899193,
"updateTime": 1643689906850,
"version": 2,
"name": "Infrastructure",
"description": "Infrastructure can be IT infrastructure, which contains hosts and servers. Infrastructure might not be IT orientated, such as 'Car' for IoT applications.",
"typeVersion": "1.2",
"serviceType": "atlas_core",
"attributeDefs": [],
"superTypes": [
"Asset"
],
"subTypes": [
"falcon_cluster"
],
"relationshipAttributeDefs": [
{
"name": "meanings",
"typeName": "array<AtlasGlossaryTerm>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "AtlasGlossarySemanticAssignment",
"isLegacyAttribute": false
}
],
"businessAttributeDefs": {}
}
DataSet
DataSet extends Referenceable. It is used to represent a type that can store data. DataSet entities can participate in data lineage.
DataSet definition
{
"category": "ENTITY",
"guid": "5d312f85-95b2-40c8-aaa3-b42643d950f0",
"createdBy": "krishna",
"updatedBy": "krishna",
"createTime": 1643689899189,
"updateTime": 1643689951742,
"version": 3,
"name": "DataSet",
"description": "DataSet",
"typeVersion": "1.2",
"serviceType": "atlas_core",
"attributeDefs": [],
"superTypes": [
"Asset"
],
"subTypes": [
"adls_gen2_container",
"rdbms_foreign_key",
"gcp_storage_base",
"spark_ml_directory",
"ozone_volume",
"hive_table",
"spark_column",
"aws_s3_pseudo_dir",
"sqoop_dbdatastore",
"hbase_column",
"rdbms_instance",
"spark_table",
"falcon_feed",
"jms_topic",
"hbase_table",
"rdbms_table",
"rdbms_column",
"hbase_column_family",
"hive_column",
"Path",
"rdbms_db",
"ml_model_deployment",
"spark_ml_pipeline",
"kafka_topic",
"ozone_bucket",
"adls_gen2_blob",
"spark_ml_model",
"aws_s3_v2_base",
"adls_gen2_directory",
"rdbms_index",
"ml_project",
"ozone_key",
"avro_type",
"aws_s3_object",
"aws_s3_bucket",
"ml_model_build",
"fs_path",
"spark_db"
],
"relationshipAttributeDefs": [
{
"name": "inputToProcesses",
"typeName": "array<Process>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "dataset_process_inputs",
"isLegacyAttribute": false
},
{
"name": "pipeline",
"typeName": "spark_ml_pipeline",
"isOptional": true,
"cardinality": "SINGLE",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "spark_ml_pipeline_dataset",
"isLegacyAttribute": false
},
{
"name": "schema",
"typeName": "array<avro_schema>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "avro_schema_associatedEntities",
"isLegacyAttribute": false
},
{
"name": "model",
"typeName": "spark_ml_model",
"isOptional": true,
"cardinality": "SINGLE",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "spark_ml_model_dataset",
"isLegacyAttribute": false
},
{
"name": "meanings",
"typeName": "array<AtlasGlossaryTerm>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "AtlasGlossarySemanticAssignment",
"isLegacyAttribute": false
},
{
"name": "outputFromProcesses",
"typeName": "array<Process>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "process_dataset_outputs",
"isLegacyAttribute": false
}
],
"businessAttributeDefs": {}
}
Process
Process type extends Asset. It is used to represent any data transformation operation. A Process type has two specific attributes, inputs and outputs, which are used to define data lineage. Both inputs and outputs are arrays of DataSet entities.
Process type definition
{
"category": "ENTITY",
"guid": "35674ebd-aaba-427c-9fc7-04860c258503",
"createdBy": "krishna",
"updatedBy": "krishna",
"createTime": 1643689899877,
"updateTime": 1643689907092,
"version": 2,
"name": "Process",
"description": "Process",
"typeVersion": "1.2",
"serviceType": "atlas_core",
"attributeDefs": [
{
"name": "inputs",
"typeName": "array<DataSet>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": 0,
"valuesMaxCount": 2147483647,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1
},
{
"name": "outputs",
"typeName": "array<DataSet>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": 0,
"valuesMaxCount": 2147483647,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1
}
],
"superTypes": [
"Asset"
],
"subTypes": [
"falcon_feed_replication",
"falcon_process",
"spark_column_lineage",
"falcon_feed_creation",
"flink_application",
"flink_process",
"ml_model_train_build_process",
"spark_process",
"ml_model_deploy_process",
"ml_project_create_process",
"hive_process",
"impala_process",
"impala_column_lineage",
"spark_application",
"sqoop_process",
"hive_column_lineage",
"storm_topology"
],
"relationshipAttributeDefs": [
{
"name": "outputs",
"typeName": "array<DataSet>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": 0,
"valuesMaxCount": 2147483647,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "process_dataset_outputs",
"isLegacyAttribute": true
},
{
"name": "inputs",
"typeName": "array<DataSet>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": 0,
"valuesMaxCount": 2147483647,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "dataset_process_inputs",
"isLegacyAttribute": true
},
{
"name": "meanings",
"typeName": "array<AtlasGlossaryTerm>",
"isOptional": true,
"cardinality": "SET",
"valuesMinCount": -1,
"valuesMaxCount": -1,
"isUnique": false,
"isIndexable": false,
"includeInNotification": false,
"searchWeight": -1,
"relationshipTypeName": "AtlasGlossarySemanticAssignment",
"isLegacyAttribute": false
}
],
"businessAttributeDefs": {}
}
Command to get type definition of a type
curl -X GET -u admin:admin -H 'accept: application/json' -H 'cache-control: no-cache' -H 'content-type: application/json' 'http://localhost:21000/api/atlas/v2/types/typedef/name/{TYPE_NAME}'
No comments:
Post a Comment