Sunday, 29 May 2022

How to debug the errors while exporting the data using sqoop?

In this post, I am going to explain how to debug the errors that occur while exporting the data using sqoop.

 

Step 1: Let’s create a database ‘debug_demo’ and create employee table in it.

 

Login to mysql shell.

[cloudera@quickstart ~]$ mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 82
Server version: 5.1.73 Source distribution

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> 

 

‘cloudera’ is the password for cloudera quick start vm instance.

 

Create a database ‘debug_demo’.

mysql> CREATE DATABASE debug_demo;
Query OK, 1 row affected (0.00 sec)

 

Create employee table.

 

CREATE TABLE employee (id INT, name VARCHAR(20), age INT, PRIMARY KEY (id));

mysql> USE debug_demo;
Database changed
mysql> 
mysql> CREATE TABLE employee (id INT, name VARCHAR(20), age INT, PRIMARY KEY (id));
Query OK, 0 rows affected (0.00 sec)

mysql> DESC employee;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id    | int(11)     | NO   | PRI | 0       |       |
| name  | varchar(20) | YES  |     | NULL    |       |
| age   | int(11)     | YES  |     | NULL    |       |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

 

Step 2: Let’s create emp.csv file and copy it to HDFS.

 

emp.csv

 

1,Ram,31
2,Krishna,32
3,Joel,41
4,Shankar,38
5,Shanthi,48
6,Sameer,29
3,Ramesh,43

 

Copy emp.csv file to HDFS location /sqoop_debug.

[cloudera@quickstart ~]$ hadoop fs -mkdir /sqoop_debug
[cloudera@quickstart ~]$ 
[cloudera@quickstart ~]$ hadoop fs -copyFromLocal emp.csv /sqoop_debug
[cloudera@quickstart ~]$ 
[cloudera@quickstart ~]$ hadoop fs -ls /sqoop_debug
Found 1 items
-rw-r--r--   1 cloudera supergroup         83 2022-04-04 21:05 /sqoop_debug/emp.csv
[cloudera@quickstart ~]$ 
[cloudera@quickstart ~]$ hadoop fs -cat /sqoop_debug/emp.csv
1,Ram,31
2,Krishna,32
3,Joel,41
4,Shankar,38
5,Shanthi,48
6,Sameer,29
3,Ramesh,43

 

Step 3: Export the content of emp.csv file to employee table.

sqoop export \
--connect "jdbc:mysql://quickstart.cloudera:3306/debug_demo" \
--username "root" \
--password "cloudera" \
--table "employee" \
--export-dir /sqoop_debug/emp.csv

 

[cloudera@quickstart ~]$ sqoop export \
> --connect "jdbc:mysql://quickstart.cloudera:3306/debug_demo" \
> --username "root" \
> --password "cloudera" \
> --table "employee" \
> --export-dir /sqoop_debug/emp.csv
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/04 22:50:30 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/04 22:50:30 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/04 22:50:30 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/04 22:50:30 INFO tool.CodeGenTool: Beginning code generation
22/04/04 22:50:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/04/04 22:50:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/04/04 22:50:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/ead527ef4e63957f145de816df764392/employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/04 22:50:33 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/ead527ef4e63957f145de816df764392/employee.jar
22/04/04 22:50:33 INFO mapreduce.ExportJobBase: Beginning export of employee
22/04/04 22:50:33 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/04 22:50:33 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/04 22:50:35 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
22/04/04 22:50:35 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
22/04/04 22:50:35 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/04 22:50:35 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/04 22:50:36 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 22:50:36 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 22:50:36 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 22:50:36 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 22:50:36 INFO input.FileInputFormat: Total input paths to process : 1
22/04/04 22:50:36 INFO input.FileInputFormat: Total input paths to process : 1
22/04/04 22:50:36 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 22:50:36 INFO mapreduce.JobSubmitter: number of splits:4
22/04/04 22:50:36 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
22/04/04 22:50:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649003113144_0012
22/04/04 22:50:37 INFO impl.YarnClientImpl: Submitted application application_1649003113144_0012
22/04/04 22:50:37 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1649003113144_0012/
22/04/04 22:50:37 INFO mapreduce.Job: Running job: job_1649003113144_0012
22/04/04 22:50:44 INFO mapreduce.Job: Job job_1649003113144_0012 running in uber mode : false
22/04/04 22:50:44 INFO mapreduce.Job:  map 0% reduce 0%
22/04/04 22:51:00 INFO mapreduce.Job:  map 100% reduce 0%
22/04/04 22:51:01 INFO mapreduce.Job: Job job_1649003113144_0012 failed with state FAILED due to: Task failed task_1649003113144_0012_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

22/04/04 22:51:01 INFO mapreduce.Job: Counters: 12
	Job Counters 
		Failed map tasks=1
		Killed map tasks=3
		Launched map tasks=4
		Data-local map tasks=4
		Total time spent by all maps in occupied slots (ms)=46010
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=46010
		Total vcore-milliseconds taken by all map tasks=46010
		Total megabyte-milliseconds taken by all map tasks=47114240
	Map-Reduce Framework
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
22/04/04 22:51:01 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
22/04/04 22:51:01 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 26.1929 seconds (0 bytes/sec)
22/04/04 22:51:01 INFO mapreduce.ExportJobBase: Exported 0 records.
22/04/04 22:51:01 ERROR tool.ExportTool: Error during export: 
Export job failed!
	at org.apache.sqoop.mapreduce.ExportJobBase.runExport(ExportJobBase.java:439)
	at org.apache.sqoop.manager.SqlManager.exportTable(SqlManager.java:931)
	at org.apache.sqoop.tool.ExportTool.exportTable(ExportTool.java:80)
	at org.apache.sqoop.tool.ExportTool.run(ExportTool.java:99)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
[cloudera@quickstart ~]$

 

As you see the above output, Job is failed with below error.

 

22/04/04 21:08:41 ERROR tool.ExportTool: Error during export:

Export job failed!

 

But I do not get any meaningful information to know the reason for failure.

 

How to know the failure cause?

In the log messages, you can see an url to track the job. When you go to the URL, you can view the detailed log messages.

 

http://quickstart.cloudera:8088/proxy/application_1649003113144_0012/

 

One problem to address here, even though the export job failed, you still see some records are copied to employee table.

mysql> SELECT * FROM employee;
+----+--------+------+
| id | name   | age  |
+----+--------+------+
|  3 | Ramesh |   43 |
+----+--------+------+
1 row in set (0.01 sec)

We can address this problem using a staging table. I will explain about this in my next post.

 



Previous                                                    Next                                                    Home

No comments:

Post a Comment