Tuesday, 31 May 2022

Introduction to Apache Atlas

 

Apache Atlas is an open governance and metadata framework.

 

Features of Atlas

Well defined Type System

a.   Pre-defined types for various Hadoop and non-Hadoop metadata

b.   Ability to define new types for the metadata to be managed

c.    Types can have primitive attributes, complex attributes, object references and can inherit from other types

d.   Instances of types, called entities, capture metadata object details and their relationships

e.   REST APIs to work with types and instances allow easier integration

 

Classifying the data

a.   Ability to dynamically create classifications - like PII, EXPIRES_ON, DATA_QUALITY, SENSITIVE

b.   Classifications can include attributes - like expiry_date attribute in EXPIRES_ON classification

c.    Entities can be associated with multiple classifications, enabling easier discovery and security enforcement

d.   Propagation of classifications via lineage - automatically ensures that classifications follow the data as it goes through various processing

 

Lineage support

a.   Intuitive UI to view lineage of data as it moves through various processes

b.   REST APIs to access and update lineage

 

Search and Discovery support

a.   Intuitive UI to search entities by type, classification, attribute value or free-text

b.   Rich REST APIs to search by complex criteria

c.    SQL like query language to search entities - Domain Specific Language (DSL)

 

Security & Data Masking

a.   Fine grained security for metadata access, enabling controls on access to entity instances and operations like add/update/remove classifications

b.   Integration with Apache Ranger enables authorization/data-masking on data access based on classifications associated with entities in Apache Atlas.

For example:

a.   who can access data classified as PII, SENSITIVE

b.   customer-service users can only see last 4 digits of columns classified as NATIONAL_ID

Apache Atlas high level Architecture

 

 


 

 

There are 4 core components in Atals.

a.   Core

b.   Integration

c.    Metadata Sources

d.   Apps

 

Core

Core component includes below components.

a.   Type System: Atlas has variety of built-in types to model data and lineage. Apart from this, you can define custom types by extending built-in types. Using Atlas types system, you can define both technical metadata and business metadata.

 

Instance of a type is called entity. You can model relationships between two entities.

 

b.   Graph Engine: To support rich relationships between Atlas entities, Atlas persist metadata objects using a Graph model. Apart from managing the graph objects, the graph engine is responsible to create indexes for the metadata objects for the efficient retrieval

 

c.    Ingest / Export: Ingest component is used to put metadata into Atlas. Export component is used to get the data from Atlas.

 

 

Integration

Atlas provide two ways to manage metadata.

a.   API: Atlas provides rich set of REST APIs to perform CURD operations on types, classifications, glossary and lineage.

 

b.   Messaging: Apart from APIs, users can choose to integrate with Atlas using a messaging interface that is based on Kafka. Atlas uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata notification events. Events are written by the hooks and Atlas to different Kafka topics.

 

 

Metadata sources

At the time of writing this post, Atlas support ingesting and managing metadata from the following sources.

 

a.   HBase

b.   Hive

c.    Sqoop

d.   Storm

e.   Kafka

 

 

Applications

Atlas metadata can be consumed by many applications. Atlas provides Atlas Admin UI and Ranger tag based policies out of the box.

 

a.   Atlas Admin UI: Apache Admin UI is a web application, internally use REST APIs to define metadata, classifications, glossary and perform search etc.,

b.   Tag based policies: Atlas is integrated with Apache Ranger to define metadata driven security policies for effective governance.

 


Previous                                                    Next                                                    Home

No comments:

Post a Comment