When you try to load the same file again, Hive append the content to existing content.
Let’s experiment it with an example.
Step 1: Lets create employee table.
CREATE TABLE emp (
id INT,
name STRING,
hobbies ARRAY<STRING>,
technology_experience MAP<STRING,STRING>,
gender_age STRUCT<gender:STRING,age:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
STORED AS TEXTFILE;
hive> CREATE TABLE emp (
> id INT,
> name STRING,
> hobbies ARRAY<STRING>,
> technology_experience MAP<STRING,STRING>,
> gender_age STRUCT<gender:STRING,age:INT>
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> STORED AS TEXTFILE;
OK
Time taken: 0.045 seconds
hive> ;
hive> DESCRIBE emp;
OK
id int
name string
hobbies array<string>
technology_experience map<string,string>
gender_age struct<gender:string,age:int>
Time taken: 0.047 seconds, Fetched: 5 row(s)
Step 2: Create empInfo.txt file with below content.
empInfo.txt
1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30 2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38 3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32 4|Gopi|Cricket|Datastage:11Yrs|Male,32
Step 3: load empInfo.txt file content to emp table.
LOAD DATA LOCAL INPATH '/home/cloudera/examples/hive/empInfo.txt' INTO TABLE emp;
hive> LOAD DATA LOCAL INPATH '/home/cloudera/examples/hive/empInfo.txt' INTO TABLE emp; Loading data to table default.emp Table default.emp stats: [numFiles=1, totalSize=207] OK Time taken: 0.21 seconds hive> ; hive> ; hive> SELECT * FROM emp; OK 1 Hari ["Football","Cricket"] {"Java":"3.4Yrs","C":"4.5Yrs"} {"gender":"Male","age":30} 2 Chamu ["Trekking","Watching movies"] {"Selenium":"5.6Yrs"} {"gender":"Feale","age":38} 3 Sailu ["Chess","Listening to music"] {"EmbeddedC":"9Yrs"} {"gender":"Femle","age":32} 4 Gopi ["Cricket"] {"Datastage":"11Yrs"} {"gender":"Male","age":32} Time taken: 0.05 seconds, Fetched: 4 row(s)
Let’s query the hdfs folder ‘/user/hive/warehouse/emp’ and confirm the same.
[cloudera@quickstart hive]$ hadoop fs -ls /user/hive/warehouse/emp
Found 1 items
-rwxrwxrwx 1 cloudera supergroup 207 2022-04-14 23:12 /user/hive/warehouse/emp/empInfo.txt
[cloudera@quickstart hive]$
[cloudera@quickstart hive]$
[cloudera@quickstart hive]$
[cloudera@quickstart hive]$ hadoop fs -cat /user/hive/warehouse/emp/empInfo.txt
1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30
2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38
3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32
4|Gopi|Cricket|Datastage:11Yrs|Male,32
[cloudera@quickstart hive]$
Step 3: Let’s update the empInfo.txt file like below.
empInfo.txt
5|Ajay|Swimming|Java:3.4Yrs,C:4.5Yrs|Male,30 6|Srinu|Teaching|Bigdata:20.6Yrs|Feale,48 3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32 4|Gopi|Cricket|Datastage:11Yrs|Male,32
Let’s load the content of empInfo.txt file again.
hive> LOAD DATA LOCAL INPATH '/home/cloudera/examples/hive/empInfo.txt' INTO TABLE emp; Loading data to table default.emp Table default.emp stats: [numFiles=2, totalSize=390] OK Time taken: 0.283 seconds
Query the emp table again.
hive> SELECT * FROM emp; OK 1 Hari ["Football","Cricket"] {"Java":"3.4Yrs","C":"4.5Yrs"} {"gender":"Male","age":30} 2 Chamu ["Trekking","Watching movies"] {"Selenium":"5.6Yrs"} {"gender":"Feale","age":38} 3 Sailu ["Chess","Listening to music"] {"EmbeddedC":"9Yrs"} {"gender":"Femle","age":32} 4 Gopi ["Cricket"] {"Datastage":"11Yrs"} {"gender":"Male","age":32} 5 Ajay ["Swimming"] {"Java":"3.4Yrs","C":"4.5Yrs"} {"gender":"Male","age":30} 6 Srinu ["Teaching"] {"Bigdata":"20.6Yrs"} {"gender":"Feale","age":48} 3 Sailu ["Chess","Listening to music"] {"EmbeddedC":"9Yrs"} {"gender":"Femle","age":32} 4 Gopi ["Cricket"] {"Datastage":"11Yrs"} {"gender":"Male","age":32} Time taken: 0.04 seconds, Fetched: 8 row(s)
From the above snippet, it is confirmed that the data is appended to existing records. Let’s query the HDFS folder /user/hive/warehouse/emp and check.
[cloudera@quickstart hive]$ hadoop fs -ls /user/hive/warehouse/emp
Found 2 items
-rwxrwxrwx 1 cloudera supergroup 207 2022-04-14 23:12 /user/hive/warehouse/emp/empInfo.txt
-rwxrwxrwx 1 cloudera supergroup 183 2022-04-14 23:16 /user/hive/warehouse/emp/empInfo_copy_1.txt
[cloudera@quickstart hive]$
[cloudera@quickstart hive]$
[cloudera@quickstart hive]$
[cloudera@quickstart hive]$ hadoop fs -cat /user/hive/warehouse/emp/empInfo.txt
1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30
2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38
3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32
4|Gopi|Cricket|Datastage:11Yrs|Male,32
[cloudera@quickstart hive]$
[cloudera@quickstart hive]$
[cloudera@quickstart hive]$ hadoop fs -cat /user/hive/warehouse/emp/empInfo_copy_1.txt
5|Ajay|Swimming|Java:3.4Yrs,C:4.5Yrs|Male,30
6|Srinu|Teaching|Bigdata:20.6Yrs|Feale,48
3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32
4|Gopi|Cricket|Datastage:11Yrs|Male,32
[cloudera@quickstart hive]$
No comments:
Post a Comment