Monday 20 June 2022

Location of Hive data in HDFS

Hive data can be stored in HDFS, Amazon S3 or any other compatible file system.

 

If you are using HDFS, then Hive stores the data at the location ‘/user/hive/warehouse’.

 

Can I customize this data location to some other directory?

By setting the property ‘hive.metastore.warehouse.dir’ in /etc/hive/conf/hive-site.xml we can customize the Hive data location. ‘/user/hive/warehouse’ is the default data location.

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>
  <description>This is the location where HIVE stores the data</description>
</property>

 

Let’s create a database, table and confirm the same.

 

Step 1: Open a terminal and execute the command ‘hive’.

[cloudera@quickstart conf]$ hive

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive>

Step 2: Create a database by executing below command.

CREATE DATABASE sample;

hive> CREATE DATABASE sample;
OK
Time taken: 1.524 seconds

Let’s query the directory ‘/user/hive/warehouse’ and confirm whether this database is created or not.

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse
Found 1 items
drwxrwxrwx   - cloudera supergroup          0 2022-04-11 23:13 /user/hive/warehouse/sample.db

As you confirm from the above output, sample.db folder is created to represent ‘sample’ database.

 

Step 3: Let’s create employee table in sample database.

create table employee(id int, name string);

hive> USE sample;
OK
Time taken: 0.062 seconds
hive> ;
hive> ;
hive> create table employee(id int, name string);
OK
Time taken: 0.37 seconds

 

Let’s query the directory ‘/user/hive/warehouse’ and confirm whether employee table  is created or not.

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/sample.db
Found 1 items
drwxrwxrwx   - cloudera supergroup          0 2022-04-11 23:16 /user/hive/warehouse/sample.db/employee
[cloudera@quickstart ~]$ 
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/sample.db/employee
[cloudera@quickstart ~]$

 

Now you can add any number of files to the employee folder that are compatible with employee table schema. I will explain this in my later posts.

 

 

 


 

Previous                                                    Next                                                    Home

No comments:

Post a Comment