In this post, I am going to explain the use of lateral view.
Lateral view is used to expand the collection of elements into rows.
Suppose, you have data like below.
Name |
hobbies |
Hari |
Football,Cricket |
Chamu |
Trekking,Watching movies |
Sailu |
Chess,Listening to music |
Gopi |
Cricket |
Using lateral view + explode function, we can transform above data like below.
Name |
Hobby |
Hari |
Football |
Hari |
Cricket |
Chamu |
Trekking |
Chamu |
Watching movies |
Sailu |
Chess |
Sailu |
Listening to music |
Gopi |
Cricket |
Find the below working application.
Step 1: Create an employee table.
CREATE TABLE employee (
id INT,
name STRING,
hobbies ARRAY<STRING>,
technology_experience MAP<STRING,STRING>,
gender_age STRUCT<gender:STRING,age:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
STORED AS TEXTFILE;
hive> CREATE TABLE employee (
> id INT,
> name STRING,
> hobbies ARRAY<STRING>,
> technology_experience MAP<STRING,STRING>,
> gender_age STRUCT<gender:STRING,age:INT>
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> STORED AS TEXTFILE;
OK
Time taken: 0.155 seconds
hive> ;
hive> ;
hive> DESC employee;
OK
id int
name string
hobbies array<string>
technology_experience map<string,string>
gender_age struct<gender:string,age:int>
Time taken: 0.041 seconds, Fetched: 5 row(s)
Step 2: Create empInfo.txt file and move it to HDFS location /user/hive/warehouse/employee (this is location where Hive stores the data).
empInfo.txt
1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30 2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38 3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32 4|Gopi|Cricket|Datastage:11Yrs|Male,32
Step 3: Add empInfo.txt file to hdfs location /user/hive/warehouse/employee. Any content you place in the directory /user/hive/warehouse/employee will be the content of the table employee.
[cloudera@quickstart hive]$ vi empInfo.txt
[cloudera@quickstart hive]$ hdfs dfs -put empInfo.txt /user/hive/warehouse/employee
[cloudera@quickstart hive]$
[cloudera@quickstart hive]$ hadoop fs -ls /user/hive/warehouse/employee
Found 1 items
-rw-r--r-- 1 cloudera supergroup 207 2022-04-18 07:46 /user/hive/warehouse/employee/empInfo.txt
Query employee table confirm the data.
hive> SELECT * FROM employee;
OK
1 Hari ["Football","Cricket"] {"Java":"3.4Yrs","C":"4.5Yrs"} {"gender":"Male","age":30}
2 Chamu ["Trekking","Watching movies"] {"Selenium":"5.6Yrs"} {"gender":"Feale","age":38}
3 Sailu ["Chess","Listening to music"] {"EmbeddedC":"9Yrs"} {"gender":"Femle","age":32}
4 Gopi ["Cricket"] {"Datastage":"11Yrs"} {"gender":"Male","age":32}
Time taken: 0.409 seconds, Fetched: 4 row(s)
Step 4: Explode employee hobbies.
hive> SELECT explode(hobbies) FROM employee;
OK
Football
Cricket
Trekking
Watching movies
Chess
Listening to music
Cricket
Time taken: 0.079 seconds, Fetched: 7 row(s)
Explode employee hobby and name together by executing below statement in the terminal.
SELECT name, hobby FROM employee lateral view explode(hobbies) employee as hobby;
hive> SELECT name, hobby FROM employee lateral view explode(hobbies) employee as hobby;
OK
Hari Football
Hari Cricket
Chamu Trekking
Chamu Watching movies
Sailu Chess
Sailu Listening to music
Gopi Cricket
Time taken: 0.098 seconds, Fetched: 7 row(s)
No comments:
Post a Comment