For all the examples, I am going to use below table definition.
CREATE TABLE employee (
id INT,
name STRING,
hobbies ARRAY<STRING>,
technology_experience MAP<STRING,STRING>,
gender_age STRUCT<gender:STRING,age:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
STORED AS TEXTFILE;
Step 1: Create table employee.
hive> CREATE TABLE employee (
> id INT,
> name STRING,
> hobbies ARRAY<STRING>,
> technology_experience MAP<STRING,STRING>,
> gender_age STRUCT<gender:STRING,age:INT>
> )
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> STORED AS TEXTFILE;
OK
Time taken: 0.034 seconds
Step 2: Create empInfo.txt file.
empInfo.txt
1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30 2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38 3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32 4|Gopi|Cricket|Datastage:11Yrs|Male,32
Step 3: Get the actual hdfs content location of table employee table and place empInfo.txt file there.
Execute the command ‘describe formatted employee;’ to see the hdfs location.
hive> describe formatted employee;
OK
# col_name data_type comment
id int
name string
hobbies array<string>
technology_experience map<string,string>
gender_age struct<gender:string,age:int>
# Detailed Table Information
Database: default
OwnerType: USER
Owner: krishna
CreateTime: Sat Jan 16 11:58:21 IST 2021
LastAccessTime: UNKNOWN
Retention: 0
Location: hdfs://localhost:9000/user/hive/warehouse/employee
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"gender_age\":\"true\",\"hobbies\":\"true\",\"id\":\"true\",\"name\":\"true\",\"technology_experience\":\"true\"}}
bucketing_version 2
numFiles 0
numRows 0
rawDataSize 0
totalSize 0
transient_lastDdlTime 1610778501
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
collection.delim ,
field.delim |
mapkey.delim :
serialization.format |
Time taken: 0.08 seconds, Fetched: 38 row(s)
From the output, I can confirm the content location of employee table as ‘/user/hive/warehouse/employee’.
Let’s put empInfo.txt file into the folder ‘/user/hive/warehouse/employee’.
hdfs dfs -put empInfo.txt /user/hive/warehouse/employee.
hive> select * from employee; OK 1 Hari ["Football","Cricket"] {"Java":"3.4Yrs","C":"4.5Yrs"} {"gender":"Male","age":30} 2 Chamu ["Trekking","Watching movies"] {"Selenium":"5.6Yrs"} {"gender":"Feale","age":38} 3 Sailu ["Chess","Listening to music"] {"EmbeddedC":"9Yrs"} {"gender":"Femle","age":32} 4 Gopi ["Cricket"] {"Datastage":"11Yrs"} {"gender":"Male","age":32} Time taken: 0.072 seconds, Fet ched: 4 row(s)
No comments:
Post a Comment