Sunday 12 June 2022

What is Apache HIVE?

Apache HIVE is a SQL interface to Hadoop, used to process structured data on Hadoop.

 

In general, you write a HQL (Hive Query Language, similar to SQL) query and submit to Hive. Hive transforms this HQL query into a Map-Reduce job and submit to Hadoop.

 


 

Who developed HIVE?

It is developed by Facebook and now managed by Apache foundation.

 

How Hive handles the data?

When you store some data into HDFS using Hive, it stores the actual data in the HDFS and metadata about the data in a database named metastore.




Why can’t Hive store the metadata in HDFS itself?

There are couple of reasons for this.

a.   Performing read/write operations in HDFS are time consuming as compared to database. If we store metadata in a database, we can perform CRUD operations quickly.

b.   File system like HDFS is not suitable for random accessing the data.

c.    Since HDFS follows write once and read many times model, you can’t modify the existing data in HDFS, but you can append. When there is a change in metadata, it is not possible to update if we store it in HDFS.

 

Is Apache Hive open source?

Yes

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment