Wednesday, 4 May 2022

Sqoop: Import all the tables from a database

‘sqoop-import-all-tables’ command is used to import all the tables from a database.

 

Syntax

sqoop-import-all-tables --connect "{jdbc_connection_url}" --username "{user_name}" --password "{password}" --warehouse-dir {destination_directory}

 

Example

sqoop-import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/import_demo" --username "root" --password "cloudera"  --warehouse-dir /import_database_demo

Above command import all the tables from import_demo database to the HDFS folder /import_database_demo.

 

Let me explain it with an example.

 

Step 1: Create database  and tables.

 

Let’s create ‘import_demo’ database by executing below command.

create database import_demo;

mysql> create database import_demo;
Query OK, 1 row affected (0.00 sec)

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| cm                 |
| firehose           |
| hue                |
| import_demo        |
| metastore          |
| mysql              |
| nav                |
| navms              |
| oozie              |
| retail_db          |
| rman               |
| sentry             |
+--------------------+
13 rows in set (0.00 sec)

Create employees table in import_demo database. To create the table in a database, we need to be in the database first. Execute below command to get into the database.

use import_demo;

mysql> use import_demo;
Database changed
mysql>

Create employee table by executing below command.

CREATE TABLE employee (id INT, name VARCHAR(20));

mysql> CREATE TABLE employee (id INT, name VARCHAR(20));
Query OK, 0 rows affected (0.02 sec)

Insert some data into employee table by executing below commands.

 

INSERT INTO employee VALUES (1, "Hari Krishna");

INSERT INTO employee VALUES (2, "Gopi");

INSERT INTO employee VALUES (3, "Ram");

INSERT INTO employee VALUES (4, "Joel");

mysql> INSERT INTO employee VALUES (1, "Hari Krishna");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO employee VALUES (2, "Gopi");
Query OK, 1 row affected (0.01 sec)

mysql> INSERT INTO employee VALUES (3, "Ram");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO employee VALUES (4, "Joel");
Query OK, 1 row affected (0.00 sec)

mysql> SELECT * FROM employee;
+------+--------------+
| id   | name         |
+------+--------------+
|    1 | Hari Krishna |
|    2 | Gopi         |
|    3 | Ram          |
|    4 | Joel         |
+------+--------------+
4 rows in set (0.00 sec)

mysql> 

Create employee_hobbies table by executing below command.

CREATE TABLE employee_hobbies(id INT, hobby VARCHAR(20));

mysql> CREATE TABLE employee_hobbies(id INT, hobby VARCHAR(20));
Query OK, 0 rows affected (0.01 sec)

mysql> DESCRIBE employee_hobbies;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id    | int(11)     | YES  |     | NULL    |       |
| hobby | varchar(20) | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

Let’s insert some data into employee_hobbies table by executing below commands.

INSERT INTO employee_hobbies VALUES (1, "trekking");
INSERT INTO employee_hobbies VALUES (1, "playing cricket");
INSERT INTO employee_hobbies VALUES (2, "stamp collection");
INSERT INTO employee_hobbies VALUES (3, "singing");
INSERT INTO employee_hobbies VALUES (4, "bike riding");
INSERT INTO employee_hobbies VALUES (4, "cooking");

mysql> INSERT INTO employee_hobbies VALUES (1, "playing cricket");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO employee_hobbies VALUES (2, "stamp collection");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO employee_hobbies VALUES (3, "singing");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO employee_hobbies VALUES (4, "bike riding");
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO employee_hobbies VALUES (4, "cooking");
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> SELECT * FROM employee_hobbies;
+------+------------------+
| id   | hobby            |
+------+------------------+
|    1 | trekking         |
|    1 | playing cricket  |
|    2 | stamp collection |
|    3 | singing          |
|    4 | bike riding      |
|    4 | cooking          |
+------+------------------+
6 rows in set (0.00 sec)

Import the tables from import_demo database to HDFS

Open terminal and execute below command.

sqoop-import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/import_demo" --username "root" --password "cloudera"  --warehouse-dir /import_database_demo -m 1

Since I do not specify any primary key to the tables employee, employee_hobbies, I should set the mapper to 1, other wiser command will fail with below error.

Error during import: No primary key could be found for table employee. Please specify one with --split-by or perform a sequential import with '-m 1'

[cloudera@quickstart Desktop]$ sqoop-import-all-tables --connect "jdbc:mysql://quickstart.cloudera:3306/import_demo" --username "root" --password "cloudera"  --warehouse-dir /import_database_demo -m 1
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/03/31 23:23:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/03/31 23:23:15 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/03/31 23:23:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/03/31 23:23:16 INFO tool.CodeGenTool: Beginning code generation
22/03/31 23:23:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/03/31 23:23:16 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/03/31 23:23:16 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/ad430666cf06ebdec1cd0f2393e5aade/employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/03/31 23:23:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/ad430666cf06ebdec1cd0f2393e5aade/employee.jar
22/03/31 23:23:18 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/03/31 23:23:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/03/31 23:23:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/03/31 23:23:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/03/31 23:23:18 INFO mapreduce.ImportJobBase: Beginning import of employee
22/03/31 23:23:18 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/03/31 23:23:18 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/03/31 23:23:19 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/03/31 23:23:19 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/03/31 23:23:20 WARN hdfs.DFSClient: Caught exception 
22/03/31 23:23:21 INFO mapreduce.JobSubmitter: number of splits:1
22/03/31 23:23:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1647946797614_0007
22/03/31 23:23:22 INFO impl.YarnClientImpl: Submitted application application_1647946797614_0007
22/03/31 23:23:22 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1647946797614_0007/
22/03/31 23:23:22 INFO mapreduce.Job: Running job: job_1647946797614_0007
22/03/31 23:23:29 INFO mapreduce.Job: Job job_1647946797614_0007 running in uber mode : false
22/03/31 23:23:29 INFO mapreduce.Job:  map 0% reduce 0%
22/03/31 23:23:35 INFO mapreduce.Job:  map 100% reduce 0%
22/03/31 23:23:36 INFO mapreduce.Job: Job job_1647946797614_0007 completed successfully
22/03/31 23:23:36 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=171018
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=35
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=4030
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=4030
		Total vcore-milliseconds taken by all map tasks=4030
		Total megabyte-milliseconds taken by all map tasks=4126720
	Map-Reduce Framework
		Map input records=4
		Map output records=4
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=53
		CPU time spent (ms)=680
		Physical memory (bytes) snapshot=136056832
		Virtual memory (bytes) snapshot=1510182912
		Total committed heap usage (bytes)=60751872
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=35
22/03/31 23:23:36 INFO mapreduce.ImportJobBase: Transferred 35 bytes in 17.3828 seconds (2.0135 bytes/sec)
22/03/31 23:23:36 INFO mapreduce.ImportJobBase: Retrieved 4 records.
22/03/31 23:23:36 INFO tool.CodeGenTool: Beginning code generation
22/03/31 23:23:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee_hobbies` AS t LIMIT 1
22/03/31 23:23:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/ad430666cf06ebdec1cd0f2393e5aade/employee_hobbies.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/03/31 23:23:37 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/ad430666cf06ebdec1cd0f2393e5aade/employee_hobbies.jar
22/03/31 23:23:37 INFO mapreduce.ImportJobBase: Beginning import of employee_hobbies
22/03/31 23:23:37 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/03/31 23:23:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/03/31 23:23:38 INFO db.DBInputFormat: Using read commited transaction isolation
22/03/31 23:23:38 INFO mapreduce.JobSubmitter: number of splits:1
22/03/31 23:23:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1647946797614_0008
22/03/31 23:23:38 INFO impl.YarnClientImpl: Submitted application application_1647946797614_0008
22/03/31 23:23:38 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1647946797614_0008/
22/03/31 23:23:38 INFO mapreduce.Job: Running job: job_1647946797614_0008
22/03/31 23:23:45 INFO mapreduce.Job: Job job_1647946797614_0008 running in uber mode : false
22/03/31 23:23:45 INFO mapreduce.Job:  map 0% reduce 0%
22/03/31 23:23:51 INFO mapreduce.Job:  map 100% reduce 0%
22/03/31 23:23:51 INFO mapreduce.Job: Job job_1647946797614_0008 completed successfully
22/03/31 23:23:51 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=171051
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=82
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=3934
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=3934
		Total vcore-milliseconds taken by all map tasks=3934
		Total megabyte-milliseconds taken by all map tasks=4028416
	Map-Reduce Framework
		Map input records=6
		Map output records=6
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=50
		CPU time spent (ms)=660
		Physical memory (bytes) snapshot=136859648
		Virtual memory (bytes) snapshot=1510182912
		Total committed heap usage (bytes)=60751872
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=82
22/03/31 23:23:51 INFO mapreduce.ImportJobBase: Transferred 82 bytes in 14.2966 seconds (5.7356 bytes/sec)
22/03/31 23:23:51 INFO mapreduce.ImportJobBase: Retrieved 6 records.
[cloudera@quickstart Desktop]$

Let’s confirm by querying the folder ‘import_database_demo’.

[cloudera@quickstart Desktop]$ hadoop fs -ls /import_database_demo
Found 2 items
drwxr-xr-x   - cloudera supergroup          0 2022-03-31 23:23 /import_database_demo/employee
drwxr-xr-x   - cloudera supergroup          0 2022-03-31 23:23 /import_database_demo/employee_hobbies
[cloudera@quickstart Desktop]$ 
[cloudera@quickstart Desktop]$ hadoop fs -ls /import_database_demo/employee
Found 2 items
-rw-r--r--   1 cloudera supergroup          0 2022-03-31 23:23 /import_database_demo/employee/_SUCCESS
-rw-r--r--   1 cloudera supergroup         35 2022-03-31 23:23 /import_database_demo/employee/part-m-00000
[cloudera@quickstart Desktop]$ 
[cloudera@quickstart Desktop]$ hadoop fs -ls /import_database_demo/employee_hobbies
Found 2 items
-rw-r--r--   1 cloudera supergroup          0 2022-03-31 23:23 /import_database_demo/employee_hobbies/_SUCCESS
-rw-r--r--   1 cloudera supergroup         82 2022-03-31 23:23 /import_database_demo/employee_hobbies/part-m-00000
[cloudera@quickstart Desktop]$ 
[cloudera@quickstart Desktop]$ hadoop fs -cat /import_database_demo/employee/*
1,Hari Krishna
2,Gopi
3,Ram
4,Joel
[cloudera@quickstart Desktop]$ 
[cloudera@quickstart Desktop]$ hadoop fs -cat /import_database_demo/employee_hobbies/*
1,trekking
1,playing cricket
2,stamp collection
3,singing
4,bike riding
4,cooking
[cloudera@quickstart Desktop]$

 

 

Previous                                                    Next                                                    Home

No comments:

Post a Comment