Thursday 26 May 2022

Sqoop: File formats

 Sqoop support various file format. Following file formats used widely.

a.   text file: It is the default file format

b.   Sequence file: Store the information in binary format

c.    Avro: Store the information in binary json format.

d.   Parquet: columnar file format

 

Import data as text file

It is the default format.

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password "cloudera" \
--table "customers" \
--target-dir /text_file_format_demo \
-m 1 \
--where "customer_id < 10"

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password "cloudera" \
--table "customers" \[cloudera@quickstart ~]$ hadoop fs -cat /text_file_format_demo/*
1,Richard,Hernandez,XXXXXXXXX,XXXXXXXXX,6303 Heather Plaza,Brownsville,TX,78521
2,Mary,Barrett,XXXXXXXXX,XXXXXXXXX,9526 Noble Embers Ridge,Littleton,CO,80126
3,Ann,Smith,XXXXXXXXX,XXXXXXXXX,3422 Blue Pioneer Bend,Caguas,PR,00725
4,Mary,Jones,XXXXXXXXX,XXXXXXXXX,8324 Little Common,San Marcos,CA,92069
5,Robert,Hudson,XXXXXXXXX,XXXXXXXXX,10 Crystal River Mall ,Caguas,PR,00725
6,Mary,Smith,XXXXXXXXX,XXXXXXXXX,3151 Sleepy Quail Promenade,Passaic,NJ,07055
7,Melissa,Wilcox,XXXXXXXXX,XXXXXXXXX,9453 High Concession,Caguas,PR,00725
8,Megan,Smith,XXXXXXXXX,XXXXXXXXX,3047 Foggy Forest Plaza,Lawrence,MA,01841
9,Mary,Perez,XXXXXXXXX,XXXXXXXXX,3616 Quaking Street,Caguas,PR,00725
--target-dir /text_file_format_demo \
-m 1 \
--where "customer_id < 10"

Import data as sequence file

Using '--as-sequencefile' option, you can ask sqoop to store the data in binary format.

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password "cloudera" \
--table "customers" \
--target-dir /sequence_file_format_demo \
-m 1 \
--where "customer_id < 10" \
--as-sequencefile




Import data as avro json format

Using '--as-avrodatafile' option, you can ask sqoop to store the data in avro format.

sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password "cloudera" \
--table "customers" \
--target-dir /avro_file_format_demo \
-m 1 \
--where "customer_id < 10" \
--as-avrodatafile




Import data as Parquet format

Using '--as-parquetfile' option, you can ask sqoop to store the data in parquet format.


Previous                                                    Next                                                    Home

No comments:

Post a Comment