‘sqoop-import-all-tables’ command is used to import all the tables from a database.
Syntax
sqoop-import-all-tables --connect "{jdbc_connection_url}" --username "{user_name}" --password "{password}" --warehouse-dir {destination_directory}
Example
sqoop-import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/import_demo" --username "root" --password "cloudera" --warehouse-dir /import_database_demo
Above command import all the tables from import_demo database to the HDFS folder /import_database_demo.
Let me explain it with an example.
Step 1: Create database and tables.
Let’s create ‘import_demo’ database by executing below command.
create database import_demo;
mysql> create database import_demo; Query OK, 1 row affected (0.00 sec) mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | cm | | firehose | | hue | | import_demo | | metastore | | mysql | | nav | | navms | | oozie | | retail_db | | rman | | sentry | +--------------------+ 13 rows in set (0.00 sec)
Create employees table in import_demo database. To create the table in a database, we need to be in the database first. Execute below command to get into the database.
use import_demo;
mysql> use import_demo; Database changed mysql>
Create employee table by executing below command.
CREATE TABLE employee (id INT, name VARCHAR(20));
mysql> CREATE TABLE employee (id INT, name VARCHAR(20)); Query OK, 0 rows affected (0.02 sec)
Insert some data into employee table by executing below commands.
INSERT INTO employee VALUES (1, "Hari Krishna");
INSERT INTO employee VALUES (2, "Gopi");
INSERT INTO employee VALUES (3, "Ram");
INSERT INTO employee VALUES (4, "Joel");
mysql> INSERT INTO employee VALUES (1, "Hari Krishna"); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO employee VALUES (2, "Gopi"); Query OK, 1 row affected (0.01 sec) mysql> INSERT INTO employee VALUES (3, "Ram"); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO employee VALUES (4, "Joel"); Query OK, 1 row affected (0.00 sec) mysql> SELECT * FROM employee; +------+--------------+ | id | name | +------+--------------+ | 1 | Hari Krishna | | 2 | Gopi | | 3 | Ram | | 4 | Joel | +------+--------------+ 4 rows in set (0.00 sec) mysql>
Create employee_hobbies table by executing below command.
CREATE TABLE employee_hobbies(id INT, hobby VARCHAR(20));
mysql> CREATE TABLE employee_hobbies(id INT, hobby VARCHAR(20)); Query OK, 0 rows affected (0.01 sec) mysql> DESCRIBE employee_hobbies; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | id | int(11) | YES | | NULL | | | hobby | varchar(20) | YES | | NULL | | +-------+-------------+------+-----+---------+-------+ 2 rows in set (0.00 sec)
Let’s insert some data into employee_hobbies table by executing below commands.
INSERT INTO employee_hobbies VALUES (1, "trekking"); INSERT INTO employee_hobbies VALUES (1, "playing cricket"); INSERT INTO employee_hobbies VALUES (2, "stamp collection"); INSERT INTO employee_hobbies VALUES (3, "singing"); INSERT INTO employee_hobbies VALUES (4, "bike riding"); INSERT INTO employee_hobbies VALUES (4, "cooking");
mysql> INSERT INTO employee_hobbies VALUES (1, "playing cricket"); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO employee_hobbies VALUES (2, "stamp collection"); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO employee_hobbies VALUES (3, "singing"); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO employee_hobbies VALUES (4, "bike riding"); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO employee_hobbies VALUES (4, "cooking"); Query OK, 1 row affected (0.00 sec) mysql> mysql> SELECT * FROM employee_hobbies; +------+------------------+ | id | hobby | +------+------------------+ | 1 | trekking | | 1 | playing cricket | | 2 | stamp collection | | 3 | singing | | 4 | bike riding | | 4 | cooking | +------+------------------+ 6 rows in set (0.00 sec)
Import the tables from import_demo database to HDFS
Open terminal and execute below command.
sqoop-import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/import_demo" --username "root" --password "cloudera" --warehouse-dir /import_database_demo -m 1
Since I do not specify any primary key to the tables employee, employee_hobbies, I should set the mapper to 1, other wiser command will fail with below error.
Error during import: No primary key could be found for table employee. Please specify one with --split-by or perform a sequential import with '-m 1'[cloudera@quickstart Desktop]$ sqoop-import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/import_demo" --username "root" --password "cloudera" --warehouse-dir /import_database_demo -m 1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/03/31 23:23:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/03/31 23:23:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/03/31 23:23:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/03/31 23:23:16 INFO tool.CodeGenTool: Beginning code generation
22/03/31 23:23:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/03/31 23:23:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/03/31 23:23:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/ad430666cf06ebdec1cd0f2393e5aade/employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/03/31 23:23:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/ad430666cf06ebdec1cd0f2393e5aade/employee.jar
22/03/31 23:23:18 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/03/31 23:23:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/03/31 23:23:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/03/31 23:23:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/03/31 23:23:18 INFO mapreduce.ImportJobBase: Beginning import of employee
22/03/31 23:23:18 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/03/31 23:23:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/03/31 23:23:19 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/03/31 23:23:19 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/03/31 23:23:20 WARN hdfs.DFSClient: Caught exception
22/03/31 23:23:21 INFO mapreduce.JobSubmitter: number of splits:1
22/03/31 23:23:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1647946797614_0007
22/03/31 23:23:22 INFO impl.YarnClientImpl: Submitted application application_1647946797614_0007
22/03/31 23:23:22 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1647946797614_0007/
22/03/31 23:23:22 INFO mapreduce.Job: Running job: job_1647946797614_0007
22/03/31 23:23:29 INFO mapreduce.Job: Job job_1647946797614_0007 running in uber mode : false
22/03/31 23:23:29 INFO mapreduce.Job: map 0% reduce 0%
22/03/31 23:23:35 INFO mapreduce.Job: map 100% reduce 0%
22/03/31 23:23:36 INFO mapreduce.Job: Job job_1647946797614_0007 completed successfully
22/03/31 23:23:36 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=171018
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=35
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=4030
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4030
Total vcore-milliseconds taken by all map tasks=4030
Total megabyte-milliseconds taken by all map tasks=4126720
Map-Reduce Framework
Map input records=4
Map output records=4
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=53
CPU time spent (ms)=680
Physical memory (bytes) snapshot=136056832
Virtual memory (bytes) snapshot=1510182912
Total committed heap usage (bytes)=60751872
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=35
22/03/31 23:23:36 INFO mapreduce.ImportJobBase: Transferred 35 bytes in 17.3828 seconds (2.0135 bytes/sec)
22/03/31 23:23:36 INFO mapreduce.ImportJobBase: Retrieved 4 records.
22/03/31 23:23:36 INFO tool.CodeGenTool: Beginning code generation
22/03/31 23:23:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee_hobbies` AS t LIMIT 1
22/03/31 23:23:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/ad430666cf06ebdec1cd0f2393e5aade/employee_hobbies.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/03/31 23:23:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/ad430666cf06ebdec1cd0f2393e5aade/employee_hobbies.jar
22/03/31 23:23:37 INFO mapreduce.ImportJobBase: Beginning import of employee_hobbies
22/03/31 23:23:37 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/03/31 23:23:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/03/31 23:23:38 INFO db.DBInputFormat: Using read commited transaction isolation
22/03/31 23:23:38 INFO mapreduce.JobSubmitter: number of splits:1
22/03/31 23:23:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1647946797614_0008
22/03/31 23:23:38 INFO impl.YarnClientImpl: Submitted application application_1647946797614_0008
22/03/31 23:23:38 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1647946797614_0008/
22/03/31 23:23:38 INFO mapreduce.Job: Running job: job_1647946797614_0008
22/03/31 23:23:45 INFO mapreduce.Job: Job job_1647946797614_0008 running in uber mode : false
22/03/31 23:23:45 INFO mapreduce.Job: map 0% reduce 0%
22/03/31 23:23:51 INFO mapreduce.Job: map 100% reduce 0%
22/03/31 23:23:51 INFO mapreduce.Job: Job job_1647946797614_0008 completed successfully
22/03/31 23:23:51 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=171051
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=82
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=3934
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=3934
Total vcore-milliseconds taken by all map tasks=3934
Total megabyte-milliseconds taken by all map tasks=4028416
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=50
CPU time spent (ms)=660
Physical memory (bytes) snapshot=136859648
Virtual memory (bytes) snapshot=1510182912
Total committed heap usage (bytes)=60751872
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=82
22/03/31 23:23:51 INFO mapreduce.ImportJobBase: Transferred 82 bytes in 14.2966 seconds (5.7356 bytes/sec)
22/03/31 23:23:51 INFO mapreduce.ImportJobBase: Retrieved 6 records.
[cloudera@quickstart Desktop]$
Let’s confirm by querying the folder ‘import_database_demo’.
[cloudera@quickstart Desktop]$ hadoop fs -ls /import_database_demo
Found 2 items
drwxr-xr-x - cloudera supergroup 0 2022-03-31 23:23 /import_database_demo/employee
drwxr-xr-x - cloudera supergroup 0 2022-03-31 23:23 /import_database_demo/employee_hobbies
[cloudera@quickstart Desktop]$
[cloudera@quickstart Desktop]$ hadoop fs -ls /import_database_demo/employee
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2022-03-31 23:23 /import_database_demo/employee/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 35 2022-03-31 23:23 /import_database_demo/employee/part-m-00000
[cloudera@quickstart Desktop]$
[cloudera@quickstart Desktop]$ hadoop fs -ls /import_database_demo/employee_hobbies
Found 2 items
-rw-r--r-- 1 cloudera supergroup 0 2022-03-31 23:23 /import_database_demo/employee_hobbies/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 82 2022-03-31 23:23 /import_database_demo/employee_hobbies/part-m-00000
[cloudera@quickstart Desktop]$
[cloudera@quickstart Desktop]$ hadoop fs -cat /import_database_demo/employee/*
1,Hari Krishna
2,Gopi
3,Ram
4,Joel
[cloudera@quickstart Desktop]$
[cloudera@quickstart Desktop]$ hadoop fs -cat /import_database_demo/employee_hobbies/*
1,trekking
1,playing cricket
2,stamp collection
3,singing
4,bike riding
4,cooking
[cloudera@quickstart Desktop]$
No comments:
Post a Comment