In this post, I am going to explain how to export data from HDFS to a database table.
Step 1: Let’s create a database and employee table.
Login to mysql shell.
[cloudera@quickstart ~]$ mysql -u root -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 75
Server version: 5.1.73 Source distribution
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
‘cloudera’ is the password for cloudera quick start vm instance.
Create a database ‘export_demo’.
mysql> CREATE DATABASE export_demo;
Query OK, 1 row affected (0.00 sec)
Create an employee table in export_demo database.
CREATE TABLE employee (id INT, name VARCHAR(20), age INT);
mysql> USE export_demo;
Database changed
mysql>
mysql> CREATE TABLE employee (id INT, name VARCHAR(20), age INT);
Query OK, 0 rows affected (0.01 sec)
mysql>
mysql> DESCRIBE employee;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
Step 2: Let’s create emp.csv file and copy it to HDFS.
emp.csv
1,Ram,31 2,Krishna,32 3,Joel,41 4,Shankar,38 5,Shanthi,48 6,Sameer,29
Copy emp.csv file to HDFS location /export_demo.
[cloudera@quickstart ~]$ hadoop fs -mkdir /export_demo
[cloudera@quickstart ~]$
[cloudera@quickstart ~]$ hadoop fs -copyFromLocal emp.csv /export_demo
[cloudera@quickstart ~]$
[cloudera@quickstart ~]$ hadoop fs -ls /export_demo
Found 1 items
-rw-r--r-- 1 cloudera supergroup 70 2022-04-04 09:37 /export_demo/emp.csv
[cloudera@quickstart ~]$
[cloudera@quickstart ~]$ hadoop -fs -cat /export_demo/emp.csv
Error: No command named `-fs' was found. Perhaps you meant `hadoop fs'
[cloudera@quickstart ~]$
[cloudera@quickstart ~]$ hadoop fs -cat /export_demo/emp.csv
1,Ram,31
2,Krishna,32
3,Joel,41
4,Shankar,38
5,Shanthi,48
6,Sameer,29
Step 3: Export the content of emp.csv file to employee table.
sqoop export \
--connect "jdbc:mysql://quickstart.cloudera:3306/export_demo" \
--username "root" \
--password "cloudera" \
--table "employee" \
--export-dir /export_demo/emp.csv
[cloudera@quickstart ~]$ sqoop export \
> --connect "jdbc:mysql://quickstart.cloudera:3306/export_demo" \
> --username "root" \
> --password "cloudera" \
> --table "employee" \
> --export-dir /export_demo/emp.csv
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/04 09:42:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/04 09:42:37 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/04 09:42:38 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/04 09:42:38 INFO tool.CodeGenTool: Beginning code generation
22/04/04 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/04/04 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employee` AS t LIMIT 1
22/04/04 09:42:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/c9f4658e25d6663f2b3cec91b53fba16/employee.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/04 09:42:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/c9f4658e25d6663f2b3cec91b53fba16/employee.jar
22/04/04 09:42:40 INFO mapreduce.ExportJobBase: Beginning export of employee
22/04/04 09:42:40 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/04 09:42:40 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/04 09:42:41 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
22/04/04 09:42:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
22/04/04 09:42:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/04 09:42:42 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/04 09:42:43 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 09:42:43 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 09:42:43 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/04 09:42:43 INFO input.FileInputFormat: Total input paths to process : 1
22/04/04 09:42:43 INFO input.FileInputFormat: Total input paths to process : 1
22/04/04 09:42:43 INFO mapreduce.JobSubmitter: number of splits:4
22/04/04 09:42:43 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
22/04/04 09:42:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649003113144_0010
22/04/04 09:42:44 INFO impl.YarnClientImpl: Submitted application application_1649003113144_0010
22/04/04 09:42:44 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1649003113144_0010/
22/04/04 09:42:44 INFO mapreduce.Job: Running job: job_1649003113144_0010
22/04/04 09:42:51 INFO mapreduce.Job: Job job_1649003113144_0010 running in uber mode : false
22/04/04 09:42:51 INFO mapreduce.Job: map 0% reduce 0%
22/04/04 09:43:05 INFO mapreduce.Job: map 25% reduce 0%
22/04/04 09:43:08 INFO mapreduce.Job: map 50% reduce 0%
22/04/04 09:43:09 INFO mapreduce.Job: map 100% reduce 0%
22/04/04 09:43:10 INFO mapreduce.Job: Job job_1649003113144_0010 completed successfully
22/04/04 09:43:11 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=683716
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=799
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=56387
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=56387
Total vcore-milliseconds taken by all map tasks=56387
Total megabyte-milliseconds taken by all map tasks=57740288
Map-Reduce Framework
Map input records=6
Map output records=6
Input split bytes=596
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=615
CPU time spent (ms)=2250
Physical memory (bytes) snapshot=502194176
Virtual memory (bytes) snapshot=6032302080
Total committed heap usage (bytes)=243007488
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
22/04/04 09:43:11 INFO mapreduce.ExportJobBase: Transferred 799 bytes in 29.0627 seconds (27.4923 bytes/sec)
22/04/04 09:43:11 INFO mapreduce.ExportJobBase: Exported 6 records.
[cloudera@quickstart ~]$
Let’s query employee table and confirm the same.
mysql> SELECT * FROM employee; +------+---------+------+ | id | name | age | +------+---------+------+ | 6 | Sameer | 29 | | 1 | Ram | 31 | | 2 | Krishna | 32 | | 3 | Joel | 41 | | 4 | Shankar | 38 | | 5 | Shanthi | 48 | +------+---------+------+ 6 rows in set (0.00 sec)
Note
Sqoop jobs either export or import, are map only jobs. There is no reducer phase.
No comments:
Post a Comment