Subquery is a query nested in another query. You can embed a subquery in FROM, WHERE clauses in Hive.
Example 1: subquery in WHERE clause
SELECT * FROM employee
WHERE employee.id IN (SELECT emp_id FROM employee_address where city='Bangalore');
Example 2: Union of records from tables employee and employee1.
SELECT * FROM
(
SELECT * FROM employee
UNION ALL
SELECT * FROM employee1
) t;
Let’s experiment it with below example.
employee table
Id |
name |
|
|
|
|
|
|
employee_address table
id |
emp_id |
street |
city |
country |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Let’s create employee and employee_address tables by executing below statements.
hive> CREATE TABLE employee (id INT, name STRING);
OK
Time taken: 0.169 seconds
hive> ;
hive> ;
hive> CREATE TABLE employee_address(id INT, emp_id INT, street STRING, city STRING, country STRING);
OK
Time taken: 0.063 seconds
Insert data into employee table.
INSERT INTO employee VALUES
(1, 'Ravi'),
(2, 'Ram'),
(3, 'Siva');
hive> INSERT INTO employee VALUES
> (1, 'Ravi'),
> (2, 'Ram'),
> (3, 'Siva');
Query ID = cloudera_20220417034141_8380194c-1f74-41cb-858f-1f043d733b31
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649172504056_0022, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1649172504056_0022/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1649172504056_0022
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-04-17 03:41:56,698 Stage-1 map = 0%, reduce = 0%
2022-04-17 03:42:05,291 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.39 sec
MapReduce Total cumulative CPU time: 1 seconds 390 msec
Ended Job = job_1649172504056_0022
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee/.hive-staging_hive_2022-04-17_03-41-47_758_1924624601576248628-1/-ext-10000
Loading data to table default.employee
Table default.employee stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.39 sec HDFS Read: 3834 HDFS Write: 92 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 390 msec
OK
Time taken: 19.176 seconds
hive>
Insert data into employee_address table.
INSERT INTO employee_address VALUES
(11, 1, 'Chowdeswari street', 'Bangalore', 'India'),
(12, 2, 'Devendra Nagar', 'Hyderabad', 'India'),
(13, 3, 'Rajaji Nagar', 'Bangaore', 'India');
hive> INSERT INTO employee_address VALUES
> (11, 1, 'Chowdeswari street', 'Bangalore', 'India'),
> (12, 2, 'Devendra Nagar', 'Hyderabad', 'India'),
> (13, 3, 'Rajaji Nagar', 'Bangaore', 'India');
Query ID = cloudera_20220417034242_9be76655-c460-4891-a1d0-43ab47d78efd
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649172504056_0023, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1649172504056_0023/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1649172504056_0023
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-04-17 03:42:51,156 Stage-1 map = 0%, reduce = 0%
2022-04-17 03:42:57,567 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.31 sec
MapReduce Total cumulative CPU time: 1 seconds 310 msec
Ended Job = job_1649172504056_0023
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee_address/.hive-staging_hive_2022-04-17_03-42-43_996_4248675605426025187-1/-ext-10000
Loading data to table default.employee_address
Table default.employee_address stats: [numFiles=1, numRows=3, totalSize=109, rawDataSize=106]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.31 sec HDFS Read: 4538 HDFS Write: 190 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 310 msec
OK
Time taken: 14.76 seconds
Let’s print and confirm the data in employee, employee_address tables.
hive> SET hive.cli.print.header=true;
hive> ;
hive> ;
hive> SELECT * FROM employee;
OK
employee.id employee.name
1 Ravi
2 Ram
3 Siva
Time taken: 0.086 seconds, Fetched: 3 row(s)
hive> ;
hive> ;
hive> SELECT * FROM employee_address;
OK
employee_address.id employee_address.emp_id employee_address.street employee_address.city employee_address.country
11 1 Chowdeswari street Bangalore India
12 2 Devendra Nagar Hyderabad India
13 3 Rajaji Nagar Bangaore India
Time taken: 0.036 seconds, Fetched: 3 row(s)
Subquery in WHERE clause
Get all the employees whose city is ‘Bangalore’.
SELECT * FROM employee
WHERE employee.id IN (SELECT emp_id FROM employee_address where city='Bangalore');
hive> SELECT * FROM employee
> WHERE employee.id IN (SELECT emp_id FROM employee_address where city='Bangalore');
Query ID = cloudera_20220417034646_5b5f015c-e344-487a-be23-16bd10683a1b
Total jobs = 1
Execution log at: /tmp/cloudera/cloudera_20220417034646_5b5f015c-e344-487a-be23-16bd10683a1b.log
2022-04-17 03:46:55 Starting to launch local task to process map join; maximum memory = 1013645312
2022-04-17 03:46:56 Dump the side-table for tag: 1 with group count: 1 into file: file:/tmp/cloudera/16ec1170-56f8-45e7-9873-522e06448b92/hive_2022-04-17_03-46-50_841_181343302282253929-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
2022-04-17 03:46:56 Uploaded 1 File to: file:/tmp/cloudera/16ec1170-56f8-45e7-9873-522e06448b92/hive_2022-04-17_03-46-50_841_181343302282253929-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (278 bytes)
2022-04-17 03:46:56 End of local task; Time Taken: 1.244 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649172504056_0024, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1649172504056_0024/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1649172504056_0024
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2022-04-17 03:47:05,536 Stage-3 map = 0%, reduce = 0%
2022-04-17 03:47:13,095 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.33 sec
MapReduce Total cumulative CPU time: 1 seconds 330 msec
Ended Job = job_1649172504056_0024
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1 Cumulative CPU: 1.33 sec HDFS Read: 6080 HDFS Write: 7 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 330 msec
OK
employee.id employee.name
1 Ravi
Time taken: 23.323 seconds, Fetched: 1 row(s)
Subquery in FROM clause
Let’s create another table like employee1 and union these students with the help of subquery in Hive.
CREATE TABLE employee1 LIKE employee;
INSERT INTO employee1 VALUES
(1, 'Raheem'),
(12, 'Joel'),
(13, 'Sailu');
hive> create table employee1 like employee;
OK
Time taken: 0.071 seconds
hive> ;
hive> ;
hive> INSERT INTO employee1 VALUES
> (1, 'Raheem'),
> (12, 'Joel'),
> (13, 'Sailu');
Query ID = cloudera_20220417035252_6e18b30f-3c57-43ea-846f-7aad713c5cec
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649172504056_0025, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1649172504056_0025/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1649172504056_0025
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2022-04-17 03:52:37,790 Stage-1 map = 0%, reduce = 0%
2022-04-17 03:52:45,259 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.48 sec
MapReduce Total cumulative CPU time: 1 seconds 480 msec
Ended Job = job_1649172504056_0025
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://quickstart.cloudera:8020/user/hive/warehouse/employee1/.hive-staging_hive_2022-04-17_03-52-30_578_3077524688190003825-1/-ext-10000
Loading data to table default.employee1
Table default.employee1 stats: [numFiles=1, numRows=3, totalSize=26, rawDataSize=23]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 1.48 sec HDFS Read: 3857 HDFS Write: 99 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 480 msec
OK
_col0 _col1
Time taken: 15.885 seconds
hive>
Let’s print the records from employee1 table.
hive> SELECT * FROM employee1;
OK
employee1.id employee1.name
1 Raheem
12 Joel
13 Sailu
Time taken: 0.074 seconds, Fetched: 3 row(s)
Let’s union the data from employee and employee1 tables.
SELECT * FROM
(
SELECT * FROM employee
UNION ALL
SELECT * FROM employee1
) t;
hive> SELECT * FROM
> (
> SELECT * FROM employee
> UNION ALL
> SELECT * FROM employee1
> ) t;
Query ID = cloudera_20220417035555_1595511b-b3a9-4950-8843-6d5f2723f009
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1649172504056_0026, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1649172504056_0026/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1649172504056_0026
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2022-04-17 03:55:24,506 Stage-1 map = 0%, reduce = 0%
2022-04-17 03:55:35,453 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.07 sec
2022-04-17 03:55:36,493 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.99 sec
MapReduce Total cumulative CPU time: 1 seconds 990 msec
Ended Job = job_1649172504056_0026
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Cumulative CPU: 1.99 sec HDFS Read: 9685 HDFS Write: 46 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 990 msec
OK
t.id t.name
1 Raheem
12 Joel
13 Sailu
1 Ravi
2 Ram
3 Siva
Time taken: 20.103 seconds, Fetched: 6 row(s)
hive>
No comments:
Post a Comment