Step 1: Create a table employee with fields id, name, list of hobbies, technology_experience and gender, age details.
CREATE TABLE employee (
id INT,
name STRING,
hobbies ARRAY<STRING>,
technology_experience MAP<STRING,STRING>,
gender_age STRUCT<gender:STRING,age:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
STORED AS TEXTFILE;
hive> CREATE TABLE employee (
> Display all 633 possibilities? (y or n)
> ame STRING,
> Display all 633 possibilities? (y or n)
> ology_experience MAP<STRING,STRING>,
> Display all 633 possibilities? (y or n)
> der_age STRUCT<gender:STRING,age:INT>
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> STORED AS TEXTFILE;
OK
Time taken: 0.213 seconds
Execute the command ‘show create table employee;’ to get where exactly the data is stored for employee table in HDFS.
hive> show create table employee; OK CREATE TABLE `employee`( `id` int, `name` string, `hobbies` array<string>, `technology_experience` map<string,string>, `gender_age` struct<gender:string,age:int>) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ( 'collection.delim'=',', 'field.delim'='|', 'mapkey.delim'=':', 'serialization.format'='|') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://localhost:9000/user/hive/warehouse/employee' TBLPROPERTIES ( 'bucketing_version'='2', 'transient_lastDdlTime'='1610696478') Time taken: 0.278 seconds, Fetched: 22 row(s)
From the output, I can confirm, information is stored in /user/hive/warehouse/employee folder of HDFS.
Step 2: Create empInfo.txt file and move it to HDFS location /user/hive/warehouse/employee.
empInfo.txt
1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30 2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38 3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32 4|Gopi|Cricket|Datastage:11Yrs|Male,32
Step 3: Add empInfo.txt file to hdfs location /user/hive/warehouse/employee. Any content you place in the directory /user/hive/warehouse/employee will be the content of the table employee.
$ hdfs dfs -put empInfo.txt /user/hive/warehouse/employee
$
$ hdfs dfs -cat /user/hive/warehouse/employee/empInfo.txt
1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30
2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38
3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32
4|Gopi|Cricket|Datastage:11Yrs|Male,32bash-3.2
Step 4: Go to hive prompt and execute the query ‘select * from employee;’ to list out all the contents of employee table.
hive> select * from employee; OK 1 Hari ["Football","Cricket"] {"Java":"3.4Yrs","C":"4.5Yrs"} {"gender":"Male","age":30} 2 Chamu ["Trekking","Watching movies"] {"Selenium":"5.6Yrs"} {"gender":"Feale","age":38} 3 Sailu ["Chess","Listening to music"] {"EmbeddedC":"9Yrs"} {"gender":"Femle","age":32} 4 Gopi ["Cricket"] {"Datastage":"11Yrs"} {"gender":"Male","age":32} Time taken: 0.158 seconds, Fetched: 4 row(s)
Query specific elements of array hobbies
hive> select hobbies[0] as first_hobby, hobbies[1] as second_hobby from employee; OK Football Cricket Trekking Watching movies Chess Listening to music Cricket NULL Time taken: 0.085 seconds, Fetched: 4 row(s)
Query struct attributes
hive> select name, gender_age.gender, gender_age.age from employee; OK Hari Male 30 Chamu Feale 38 Sailu Femle 32 Gopi Male 32 Time taken: 0.1 seconds, Fetched: 4 row(s)
Query map elements
hive> select name, technology_experience['Java'], technology_experience['Datastage'] from employee; OK Hari 3.4Yrs NULL Chamu NULL NULL Sailu NULL NULL Gopi NULL 11Yrs Time taken: 0.102 seconds, Fetched: 4 row(s)
Previous Next Home
No comments:
Post a Comment