Programming for beginners: Hive: Insert data using insert query

Step 1: Create user table.

hive> CREATE TABLE user (
    > id INT,
    > name STRING
    > );
OK
Time taken: 0.252 seconds
hive> ;
hive> ;
hive> DESC user;
OK
id                  	int                 	                    
name                	string              	                    
Time taken: 0.12 seconds, Fetched: 2 row(s)

Step 2: Execute INSERT query to insert new records to user table.

INSERT INTO user VALUES(1, 'Krishna');

hive> INSERT INTO user VALUES(1, 'Krishna');
Query ID = cloudera_20220414020505_7dcebdb2-e470-423a-a09e-583c4733033b
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649172504056_0014, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1649172504056_0014/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1649172504056_0014
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-04-14 02:05:20,856 Stage-1 map = 0%,  reduce = 0%
2022-04-14 02:05:28,724 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.59 sec
MapReduce Total cumulative CPU time: 1 seconds 590 msec
Ended Job = job_1649172504056_0014
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/user/.hive-staging_hive_2022-04-14_02-05-12_954_681757281437170910-1/-ext-10000
Loading data to table default.user
Table default.user stats: [numFiles=1, numRows=1, totalSize=10, rawDataSize=9]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.59 sec   HDFS Read: 3793 HDFS Write: 78 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 590 msec
OK
Time taken: 17.151 seconds

Each insert query internally triggers a map-reduce job.

Let’s query all the records from user table.

hive> SELECT * FROM user;
OK
1	Krishna
Time taken: 0.059 seconds, Fetched: 1 row(s)

Let’s query the HDFS folder /user/hive/warehouse/user (this is the location where Hive store the records of user table, you can get this by executing the command ‘DESCRIBE EXTENDED user;’ from terminal.

[cloudera@quickstart conf]$ hadoop fs -ls /user/hive/warehouse/user
Found 1 items
-rwxrwxrwx   1 cloudera supergroup         10 2022-04-14 02:05 /user/hive/warehouse/user/000000_0

Every insert create a new file in the directory /user/hive/warehouse/user.

Let’s print the content of file ‘/user/hive/warehouse/user/000000_0’.

[cloudera@quickstart conf]$ hadoop fs -cat /user/hive/warehouse/user/000000_0
1Krishna

Let’s add one more record to user table and confirm that one more files gets created in the folder /user/hive/warehouse/user.

hive> INSERT INTO user VALUES(2, 'Ram');
Query ID = cloudera_20220414021010_f28696f4-52d8-41b4-8a2d-305e545f9a0a
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649172504056_0015, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1649172504056_0015/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1649172504056_0015
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-04-14 02:10:26,718 Stage-1 map = 0%,  reduce = 0%
2022-04-14 02:10:33,080 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.2 sec
MapReduce Total cumulative CPU time: 1 seconds 200 msec
Ended Job = job_1649172504056_0015
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/user/.hive-staging_hive_2022-04-14_02-10-20_492_2704729077173014892-1/-ext-10000
Loading data to table default.user
Table default.user stats: [numFiles=2, numRows=2, totalSize=16, rawDataSize=14]
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1   Cumulative CPU: 1.2 sec   HDFS Read: 3887 HDFS Write: 74 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 200 msec
OK
Time taken: 13.826 seconds

Let’s query the folder ‘/user/hive/warehouse/user’.

[cloudera@quickstart conf]$ hadoop fs -ls /user/hive/warehouse/user
Found 2 items
-rwxrwxrwx   1 cloudera supergroup         10 2022-04-14 02:05 /user/hive/warehouse/user/000000_0
-rwxrwxrwx   1 cloudera supergroup          6 2022-04-14 02:10 /user/hive/warehouse/user/000000_0_copy_1

Let’s query the content of files.

[cloudera@quickstart conf]$ hadoop fs -cat /user/hive/warehouse/user/000000_0
1Krishna
[cloudera@quickstart conf]$ 
[cloudera@quickstart conf]$ hadoop fs -cat /user/hive/warehouse/user/000000_0_copy_1
2Ram

Previous Next Home

Programming for beginners

Monday, 18 July 2022

Hive: Insert data using insert query

No comments:

Post a Comment