Programming for beginners: Location of Hive data in HDFS

Hive data can be stored in HDFS, Amazon S3 or any other compatible file system.

If you are using HDFS, then Hive stores the data at the location ‘/user/hive/warehouse’.

Can I customize this data location to some other directory?

By setting the property ‘hive.metastore.warehouse.dir’ in /etc/hive/conf/hive-site.xml we can customize the Hive data location. ‘/user/hive/warehouse’ is the default data location.

<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>
  <description>This is the location where HIVE stores the data</description>
</property>

Let’s create a database, table and confirm the same.

Step 1: Open a terminal and execute the command ‘hive’.

[cloudera@quickstart conf]$ hive

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to Beeline is recommended.
hive>

Step 2: Create a database by executing below command.

CREATE DATABASE sample;

hive> CREATE DATABASE sample;
OK
Time taken: 1.524 seconds

Let’s query the directory ‘/user/hive/warehouse’ and confirm whether this database is created or not.

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse
Found 1 items
drwxrwxrwx   - cloudera supergroup          0 2022-04-11 23:13 /user/hive/warehouse/sample.db

As you confirm from the above output, sample.db folder is created to represent ‘sample’ database.

Step 3: Let’s create employee table in sample database.

create table employee(id int, name string);

hive> USE sample;
OK
Time taken: 0.062 seconds
hive> ;
hive> ;
hive> create table employee(id int, name string);
OK
Time taken: 0.37 seconds

Let’s query the directory ‘/user/hive/warehouse’ and confirm whether employee table is created or not.

[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/sample.db
Found 1 items
drwxrwxrwx   - cloudera supergroup          0 2022-04-11 23:16 /user/hive/warehouse/sample.db/employee
[cloudera@quickstart ~]$ 
[cloudera@quickstart ~]$ hadoop fs -ls /user/hive/warehouse/sample.db/employee
[cloudera@quickstart ~]$

Now you can add any number of files to the employee folder that are compatible with employee table schema. I will explain this in my later posts.

Previous Next Home

Programming for beginners

Monday, 20 June 2022

Location of Hive data in HDFS

No comments:

Post a Comment