--delete-target-dir option overwrite the content of target directory.
Let me explain it with an example.
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password "cloudera" \
--table "customers" \
--target-dir /overwrite_demo
Let’s execute above command to copy the customers table to the directory /overwrite_demo.
[cloudera@quickstart ~]$ sqoop import \
> --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
> --username "root" \
> --password "cloudera" \
> --table "customers" \
> --target-dir /overwrite_demo
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/03 09:28:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/03 09:28:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/03 09:28:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/03 09:28:30 INFO tool.CodeGenTool: Beginning code generation
22/04/03 09:28:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 09:28:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 09:28:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/8f2c1d9b456acc2c526255b25262285e/customers.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/03 09:28:34 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/8f2c1d9b456acc2c526255b25262285e/customers.jar
22/04/03 09:28:34 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/04/03 09:28:34 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/04/03 09:28:34 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/04/03 09:28:34 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/04/03 09:28:34 INFO mapreduce.ImportJobBase: Beginning import of customers
22/04/03 09:28:34 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/03 09:28:34 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/03 09:28:35 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/03 09:28:35 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/03 09:28:38 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 09:28:38 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 09:28:38 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 09:28:38 INFO db.DBInputFormat: Using read commited transaction isolation
22/04/03 09:28:38 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`customer_id`), MAX(`customer_id`) FROM `customers`
22/04/03 09:28:38 INFO db.IntegerSplitter: Split size: 3108; Num splits: 4 from: 1 to: 12435
22/04/03 09:28:38 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 09:28:38 INFO mapreduce.JobSubmitter: number of splits:4
22/04/03 09:28:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649003113144_0001
22/04/03 09:28:39 INFO impl.YarnClientImpl: Submitted application application_1649003113144_0001
22/04/03 09:28:39 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1649003113144_0001/
22/04/03 09:28:39 INFO mapreduce.Job: Running job: job_1649003113144_0001
22/04/03 09:28:52 INFO mapreduce.Job: Job job_1649003113144_0001 running in uber mode : false
22/04/03 09:28:52 INFO mapreduce.Job: map 0% reduce 0%
22/04/03 09:29:34 INFO mapreduce.Job: map 25% reduce 0%
22/04/03 09:29:37 INFO mapreduce.Job: map 50% reduce 0%
22/04/03 09:29:38 INFO mapreduce.Job: map 75% reduce 0%
22/04/03 09:29:39 INFO mapreduce.Job: map 100% reduce 0%
22/04/03 09:29:39 INFO mapreduce.Job: Job job_1649003113144_0001 completed successfully
22/04/03 09:29:39 INFO mapreduce.Job: Counters: 31
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=685840
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=487
HDFS: Number of bytes written=953525
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Killed map tasks=1
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=166452
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=166452
Total vcore-milliseconds taken by all map tasks=166452
Total megabyte-milliseconds taken by all map tasks=170446848
Map-Reduce Framework
Map input records=12435
Map output records=12435
Input split bytes=487
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=2792
CPU time spent (ms)=7840
Physical memory (bytes) snapshot=568446976
Virtual memory (bytes) snapshot=6043832320
Total committed heap usage (bytes)=243007488
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=953525
22/04/03 09:29:39 INFO mapreduce.ImportJobBase: Transferred 931.1768 KB in 63.8095 seconds (14.5931 KB/sec)
22/04/03 09:29:39 INFO mapreduce.ImportJobBase: Retrieved 12435 records.
[cloudera@quickstart ~]$
Let’s query the directory ‘/overwrite_demo’.
[cloudera@quickstart ~]$ hadoop fs -ls /overwrite_demo
Found 5 items
-rw-r--r-- 1 cloudera supergroup 0 2022-04-03 09:29 /overwrite_demo/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 237145 2022-04-03 09:29 /overwrite_demo/part-m-00000
-rw-r--r-- 1 cloudera supergroup 237965 2022-04-03 09:29 /overwrite_demo/part-m-00001
-rw-r--r-- 1 cloudera supergroup 238092 2022-04-03 09:29 /overwrite_demo/part-m-00002
-rw-r--r-- 1 cloudera supergroup 240323 2022-04-03 09:29 /overwrite_demo/part-m-00003
When you retry to execute the command ‘sqoop import’, you will get an error ‘/overwrite_demo already exists’.
[cloudera@quickstart ~]$ sqoop import \
> --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
> --username "root" \
> --password "cloudera" \
> --table "customers" \
> --target-dir /overwrite_demo
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/03 09:32:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/03 09:32:05 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/03 09:32:05 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/03 09:32:05 INFO tool.CodeGenTool: Beginning code generation
22/04/03 09:32:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 09:32:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 09:32:06 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/048abc47d272bcd935bbe79182ed355f/customers.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/03 09:32:08 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/048abc47d272bcd935bbe79182ed355f/customers.jar
22/04/03 09:32:08 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/04/03 09:32:08 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/04/03 09:32:08 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/04/03 09:32:08 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/04/03 09:32:08 INFO mapreduce.ImportJobBase: Beginning import of customers
22/04/03 09:32:08 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/03 09:32:08 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/03 09:32:09 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/03 09:32:09 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/03 09:32:10 WARN security.UserGroupInformation: PriviledgedActionException as:cloudera (auth:SIMPLE) cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/overwrite_demo already exists
22/04/03 09:32:10 ERROR tool.ImportTool: Import failed: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://quickstart.cloudera:8020/overwrite_demo already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:146)
at org.apache.hadoop.mapreduce.JobSubmitter.checkSpecs(JobSubmitter.java:270)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:143)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1304)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1304)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1325)
at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:203)
at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:176)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:273)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:513)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621)
at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
[cloudera@quickstart ~]$
How to resolve above error?
There are two options.
a. Change the target directory to some non-exist folder
b. Override the content of target directory using --delete-target-dir option.
Let’s override the content of target directory.
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password "cloudera" \
--table "customers" \
--target-dir /overwrite_demo \
--delete-target-dir
[cloudera@quickstart ~]$ sqoop import \
> --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
> --username "root" \
> --password "cloudera" \
> --table "customers" \
> --target-dir /overwrite_demo \
> --delete-target-dir
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/03 09:34:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/03 09:34:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/03 09:34:33 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/03 09:34:33 INFO tool.CodeGenTool: Beginning code generation
22/04/03 09:34:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 09:34:33 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 09:34:33 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/6dfc74dc41da166c48e61c9aa97cbb00/customers.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/03 09:34:36 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/6dfc74dc41da166c48e61c9aa97cbb00/customers.jar
22/04/03 09:34:37 INFO tool.ImportTool: Destination directory /overwrite_demo deleted.
22/04/03 09:34:37 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/04/03 09:34:37 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/04/03 09:34:37 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/04/03 09:34:37 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/04/03 09:34:37 INFO mapreduce.ImportJobBase: Beginning import of customers
22/04/03 09:34:37 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/03 09:34:37 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/03 09:34:37 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/03 09:34:37 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/03 09:34:38 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 09:34:38 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 09:34:38 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 09:34:38 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 09:34:39 INFO mapreduce.JobSubmitter: number of splits:4
22/04/03 09:34:39 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649003113144_0002
22/04/03 09:34:39 INFO impl.YarnClientImpl: Submitted application application_1649003113144_0002
22/04/03 09:34:39 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1649003113144_0002/
22/04/03 09:34:39 INFO mapreduce.Job: Running job: job_1649003113144_0002
22/04/03 09:34:48 INFO mapreduce.Job: Job job_1649003113144_0002 running in uber mode : false
22/04/03 09:34:48 INFO mapreduce.Job: map 0% reduce 0%
22/04/03 09:35:03 INFO mapreduce.Job: map 25% reduce 0%
22/04/03 09:35:07 INFO mapreduce.Job: map 50% reduce 0%
22/04/03 09:35:08 INFO mapreduce.Job: map 100% reduce 0%
22/04/03 09:35:09 INFO mapreduce.Job: Job job_1649003113144_0002 completed successfully
22/04/03 09:35:09 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=685836
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=487
HDFS: Number of bytes written=953525
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=63022
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=63022
Total vcore-milliseconds taken by all map tasks=63022
Total megabyte-milliseconds taken by all map tasks=64534528
Map-Reduce Framework
Map input records=12435
Map output records=12435
Input split bytes=487
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=746
CPU time spent (ms)=4730
Physical memory (bytes) snapshot=556478464
Virtual memory (bytes) snapshot=6044942336
Total committed heap usage (bytes)=243007488
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=953525
22/04/03 09:35:09 INFO mapreduce.ImportJobBase: Transferred 931.1768 KB in 32.1808 seconds (28.9358 KB/sec)
22/04/03 09:35:09 INFO mapreduce.ImportJobBase: Retrieved 12435 records.
[cloudera@quickstart ~]$
Let’s query the content of /overwrite_demo folder.
[cloudera@quickstart ~]$ hadoop fs -ls /overwrite_demo
Found 5 items
-rw-r--r-- 1 cloudera supergroup 0 2022-04-03 09:35 /overwrite_demo/_SUCCESS
-rw-r--r-- 1 cloudera supergroup 237145 2022-04-03 09:35 /overwrite_demo/part-m-00000
-rw-r--r-- 1 cloudera supergroup 237965 2022-04-03 09:35 /overwrite_demo/part-m-00001
-rw-r--r-- 1 cloudera supergroup 238092 2022-04-03 09:35 /overwrite_demo/part-m-00002
-rw-r--r-- 1 cloudera supergroup 240323 2022-04-03 09:35 /overwrite_demo/part-m-00003
[cloudera@quickstart ~]$
No comments:
Post a Comment