When you are importing the data using ‘sqoop import’ command, sqoop use new line as row separator
You can customize the line separator using ‘--lines-terminated-by’ option.
Example
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password "cloudera" \
--table "customers" \
--target-dir /line_separator_demo \
-m 1 \
--where "customer_id < 10" \
--lines-terminated-by ';'
[cloudera@quickstart ~]$ sqoop import \
> --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
> --username "root" \
> --password "cloudera" \
> --table "customers" \
> --target-dir /line_separator_demo \
> -m 1 \
> --where "customer_id < 10" \
> --lines-terminated-by ';'
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/03 22:30:14 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/03 22:30:14 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
22/04/03 22:30:15 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/03 22:30:15 INFO tool.CodeGenTool: Beginning code generation
22/04/03 22:30:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 22:30:15 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/03 22:30:15 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/af8b98018073635bb6a1970ca3018e6e/customers.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/03 22:30:17 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/af8b98018073635bb6a1970ca3018e6e/customers.jar
22/04/03 22:30:17 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/04/03 22:30:17 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/04/03 22:30:17 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/04/03 22:30:17 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/04/03 22:30:17 INFO mapreduce.ImportJobBase: Beginning import of customers
22/04/03 22:30:17 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/03 22:30:17 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/03 22:30:18 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/03 22:30:18 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/03 22:30:19 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 22:30:19 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 22:30:19 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 22:30:19 WARN hdfs.DFSClient: Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1281)
at java.lang.Thread.join(Thread.java:1355)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/03 22:30:20 INFO db.DBInputFormat: Using read commited transaction isolation
22/04/03 22:30:20 INFO mapreduce.JobSubmitter: number of splits:1
22/04/03 22:30:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649003113144_0007
22/04/03 22:30:20 INFO impl.YarnClientImpl: Submitted application application_1649003113144_0007
22/04/03 22:30:20 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1649003113144_0007/
22/04/03 22:30:20 INFO mapreduce.Job: Running job: job_1649003113144_0007
22/04/03 22:30:27 INFO mapreduce.Job: Job job_1649003113144_0007 running in uber mode : false
22/04/03 22:30:27 INFO mapreduce.Job: map 0% reduce 0%
22/04/03 22:30:33 INFO mapreduce.Job: map 100% reduce 0%
22/04/03 22:30:33 INFO mapreduce.Job: Job job_1649003113144_0007 completed successfully
22/04/03 22:30:33 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=171816
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=673
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=3307
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=3307
Total vcore-milliseconds taken by all map tasks=3307
Total megabyte-milliseconds taken by all map tasks=3386368
Map-Reduce Framework
Map input records=9
Map output records=9
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=47
CPU time spent (ms)=630
Physical memory (bytes) snapshot=137871360
Virtual memory (bytes) snapshot=1510182912
Total committed heap usage (bytes)=60751872
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=673
22/04/03 22:30:33 INFO mapreduce.ImportJobBase: Transferred 673 bytes in 14.8699 seconds (45.2593 bytes/sec)
22/04/03 22:30:33 INFO mapreduce.ImportJobBase: Retrieved 9 records.
[cloudera@quickstart ~]$
Let’s query the folder /line_separator_demo and confirm the same.
[cloudera@quickstart ~]$ hadoop fs -cat /line_separator_demo/*
1,Richard,Hernandez,XXXXXXXXX,XXXXXXXXX,6303 Heather Plaza,Brownsville,TX,78521;2,Mary,Barrett,XXXXXXXXX,XXXXXXXXX,9526 Noble Embers Ridge,Littleton,CO,80126;3,Ann,Smith,XXXXXXXXX,XXXXXXXXX,3422 Blue Pioneer Bend,Caguas,PR,00725;4,Mary,Jones,XXXXXXXXX,XXXXXXXXX,8324 Little Common,San Marcos,CA,92069;5,Robert,Hudson,XXXXXXXXX,XXXXXXXXX,10 Crystal River Mall ,Caguas,PR,00725;6,Mary,Smith,XXXXXXXXX,XXXXXXXXX,3151 Sleepy Quail Promenade,Passaic,NJ,07055;7,Melissa,Wilcox,XXXXXXXXX,XXXXXXXXX,9453 High Concession,Caguas,PR,00725;8,Megan,Smith,XXXXXXXXX,XXXXXXXXX,3047 Foggy Forest Plaza,Lawrence,MA,01841;9,Mary,Perez,XXXXXXXXX,XXXXXXXXX,3616 Quaking Street,Caguas,PR,00725;[cloudera@quickstart ~]$
From the output, I can confirm that ; is used as line separator.
Previous Next Home
No comments:
Post a Comment