Sunday, 10 July 2022

Create, Insert and query data from hive table

Step 1: Create a table employee with fields id, name, list of hobbies, technology_experience and gender, age details.

CREATE TABLE employee (
	id INT,
	name STRING,
	hobbies ARRAY<STRING>,
	technology_experience MAP<STRING,STRING>,
	gender_age STRUCT<gender:STRING,age:INT>
 )
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '|'
 COLLECTION ITEMS TERMINATED BY ','
 MAP KEYS TERMINATED BY ':'
 STORED AS TEXTFILE;

hive> CREATE TABLE employee (
    > Display all 633 possibilities? (y or n)
    > ame STRING,
    > Display all 633 possibilities? (y or n)
    > ology_experience MAP<STRING,STRING>,
    > Display all 633 possibilities? (y or n)
    > der_age STRUCT<gender:STRING,age:INT>
    >  )
    >  ROW FORMAT DELIMITED
    >  FIELDS TERMINATED BY '|'
    >  COLLECTION ITEMS TERMINATED BY ','
    >  MAP KEYS TERMINATED BY ':'
    >  STORED AS TEXTFILE;
OK
Time taken: 0.213 seconds

Execute the command ‘show create table employee;’ to get where exactly the data is stored for employee table in HDFS. 

hive> show create table employee;
OK
CREATE TABLE `employee`(
  `id` int, 
  `name` string, 
  `hobbies` array<string>, 
  `technology_experience` map<string,string>, 
  `gender_age` struct<gender:string,age:int>)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'collection.delim'=',', 
  'field.delim'='|', 
  'mapkey.delim'=':', 
  'serialization.format'='|') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:9000/user/hive/warehouse/employee'
TBLPROPERTIES (
  'bucketing_version'='2', 
  'transient_lastDdlTime'='1610696478')
Time taken: 0.278 seconds, Fetched: 22 row(s)

From the output, I can confirm, information is stored in /user/hive/warehouse/employee folder of HDFS.

 

Step 2: Create empInfo.txt file and move it to HDFS location /user/hive/warehouse/employee.

 

empInfo.txt

 

1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30
2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38
3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32
4|Gopi|Cricket|Datastage:11Yrs|Male,32

Step 3: Add empInfo.txt file to hdfs location /user/hive/warehouse/employee. Any content you place in the directory /user/hive/warehouse/employee will be the content of the table employee.

$ hdfs dfs -put empInfo.txt /user/hive/warehouse/employee
$ 
$ hdfs dfs -cat /user/hive/warehouse/employee/empInfo.txt
1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30
2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38
3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32
4|Gopi|Cricket|Datastage:11Yrs|Male,32bash-3.2

Step 4: Go to hive prompt and execute the query ‘select * from employee;’ to list out all the contents of employee table.

hive> select * from employee;
OK
1 Hari  ["Football","Cricket"]  {"Java":"3.4Yrs","C":"4.5Yrs"}  {"gender":"Male","age":30}
2 Chamu ["Trekking","Watching movies"]  {"Selenium":"5.6Yrs"} {"gender":"Feale","age":38}
3 Sailu ["Chess","Listening to music"]  {"EmbeddedC":"9Yrs"}  {"gender":"Femle","age":32}
4 Gopi  ["Cricket"] {"Datastage":"11Yrs"} {"gender":"Male","age":32}
Time taken: 0.158 seconds, Fetched: 4 row(s)

Query specific elements of array hobbies

hive> select hobbies[0] as first_hobby, hobbies[1] as second_hobby from employee;
OK
Football  Cricket
Trekking  Watching movies
Chess Listening to music
Cricket NULL
Time taken: 0.085 seconds, Fetched: 4 row(s)

Query struct attributes

hive> select name, gender_age.gender, gender_age.age from employee;
OK
Hari  Male  30
Chamu Feale 38
Sailu Femle 32
Gopi  Male  32
Time taken: 0.1 seconds, Fetched: 4 row(s)

Query map elements

hive> select name, technology_experience['Java'], technology_experience['Datastage'] from employee;
OK
Hari  3.4Yrs  NULL
Chamu NULL  NULL
Sailu NULL  NULL
Gopi  NULL  11Yrs
Time taken: 0.102 seconds, Fetched: 4 row(s)

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment