Programming for beginners: HIVE: Exploring select statement

For all the examples, I am going to use below table definition.

CREATE TABLE employee (
id INT,
name STRING,
hobbies ARRAY<STRING>,
technology_experience MAP<STRING,STRING>,
gender_age STRUCT<gender:STRING,age:INT>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
STORED AS TEXTFILE;

Step 1: Create table employee.

hive> CREATE TABLE employee (
    > id INT,
    > name STRING,
    > hobbies ARRAY<STRING>,
    > technology_experience MAP<STRING,STRING>,
    > gender_age STRUCT<gender:STRING,age:INT>
    > )
    > ROW FORMAT DELIMITED
    > FIELDS TERMINATED BY '|'
    > COLLECTION ITEMS TERMINATED BY ','
    > MAP KEYS TERMINATED BY ':'
    > STORED AS TEXTFILE;
OK
Time taken: 0.034 seconds

Step 2: Create empInfo.txt file.

empInfo.txt

1|Hari|Football,Cricket|Java:3.4Yrs,C:4.5Yrs|Male,30
2|Chamu|Trekking,Watching movies|Selenium:5.6Yrs|Feale,38
3|Sailu|Chess,Listening to music|EmbeddedC:9Yrs|Femle,32
4|Gopi|Cricket|Datastage:11Yrs|Male,32

Step 3: Get the actual hdfs content location of table employee table and place empInfo.txt file there.

Execute the command ‘describe formatted employee;’ to see the hdfs location.

hive> describe formatted employee;
OK
# col_name              data_type               comment             
id                      int                                         
name                    string                                      
hobbies                 array<string>                               
technology_experience   map<string,string>                          
gender_age              struct<gender:string,age:int>                       
         
# Detailed Table Information         
Database:               default                  
OwnerType:              USER                     
Owner:                  krishna                  
CreateTime:             Sat Jan 16 11:58:21 IST 2021     
LastAccessTime:         UNKNOWN                  
Retention:              0                        
Location:               hdfs://localhost:9000/user/hive/warehouse/employee   
Table Type:             MANAGED_TABLE            
Table Parameters:        
    COLUMN_STATS_ACCURATE   {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"gender_age\":\"true\",\"hobbies\":\"true\",\"id\":\"true\",\"name\":\"true\",\"technology_experience\":\"true\"}}
    bucketing_version       2                   
    numFiles                0                   
    numRows                 0                   
    rawDataSize             0                   
    totalSize               0                   
    transient_lastDdlTime   1610778501          
         
# Storage Information        
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe   
InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat   
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:         
    collection.delim        ,                   
    field.delim             |                   
    mapkey.delim            :                   
    serialization.format    |                   
Time taken: 0.08 seconds, Fetched: 38 row(s)

From the output, I can confirm the content location of employee table as ‘/user/hive/warehouse/employee’.

Let’s put empInfo.txt file into the folder ‘/user/hive/warehouse/employee’.

hdfs dfs -put empInfo.txt /user/hive/warehouse/employee.

Query employee table.

hive> select * from employee;
OK
1   Hari    ["Football","Cricket"]  {"Java":"3.4Yrs","C":"4.5Yrs"}  {"gender":"Male","age":30}
2   Chamu   ["Trekking","Watching movies"]  {"Selenium":"5.6Yrs"}   {"gender":"Feale","age":38}
3   Sailu   ["Chess","Listening to music"]  {"EmbeddedC":"9Yrs"}    {"gender":"Femle","age":32}
4   Gopi    ["Cricket"] {"Datastage":"11Yrs"}   {"gender":"Male","age":32}
Time taken: 0.072 seconds, Fet ched: 4 row(s)

Previous Next Home

Programming for beginners

Tuesday, 2 August 2022

HIVE: Exploring select statement

No comments:

Post a Comment