Saturday, 4 June 2022

How to supply the password file to a sqoop job

In this post, I am going to explain how to supply a password file to the sqoop job.

 

You can supply the password file at the time of job creation using --password-file option.

 

Example

--password-file file://{FILE_ABSOLTE_PATH}

 

Step 1: Let’s create a password file by executing below command.

echo -n "cloudera">> .mySQLPassword

 

Above command copy the string "cloudera" to the hidden file .mySQLPassword.

[cloudera@quickstart ~]$ echo -n "cloudera">> .mySQLPassword
[cloudera@quickstart ~]$ 
[cloudera@quickstart ~]$ cat .mySQLPassword 
[cloudera@quickstart ~]$ pwd
/home/cloudera

 


Step 2: Let’s create a job by supplying the password file /home/cloudera/.mySQLPassword.

sqoop job \
--create customer_import_supply_pwd_via_file \
-- import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password-file file:///home/cloudera/.mySQLPassword \
--table "customers" \
--warehouse-dir /job-demo-2 \
--incremental append \
--check-column customer_id \
--last-value 0

  Let’s execute above command to create the job ‘customer_import_supply_pwd_via_file’.

 

[cloudera@quickstart ~]$ sqoop job \
> --create customer_import_supply_pwd_via_file \
> -- import \
> --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
> --username "root" \
> --password-file file:///home/cloudera/.mySQLPassword \
> --table "customers" \
> --warehouse-dir /job-demo-2 \
> --incremental append \
> --check-column customer_id \
> --last-value 0 
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/08 19:27:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0

Let’s print all the jobs.

[cloudera@quickstart ~]$ sqoop job --list
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/08 19:28:07 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
Available jobs:
  customer_import
  customer_import_supply_pwd_via_file

Let’s execute the job ‘customer_import_supply_pwd_via_file’.

 

sqoop job --exec customer_import_supply_pwd_via_file

 

Run above command in the terminal.

[cloudera@quickstart ~]$ sqoop job --exec customer_import_supply_pwd_via_file
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/08 19:28:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/08 19:29:00 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/08 19:29:00 INFO tool.CodeGenTool: Beginning code generation
22/04/08 19:29:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/08 19:29:00 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/08 19:29:00 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/1f442024b3782a936a38433b3d512adf/customers.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/08 19:29:02 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/1f442024b3782a936a38433b3d512adf/customers.jar
22/04/08 19:29:02 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`customer_id`) FROM `customers`
22/04/08 19:29:02 INFO tool.ImportTool: Incremental import based on column `customer_id`
22/04/08 19:29:02 INFO tool.ImportTool: Lower bound value: 0
22/04/08 19:29:02 INFO tool.ImportTool: Upper bound value: 12440
22/04/08 19:29:02 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/04/08 19:29:02 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/04/08 19:29:02 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/04/08 19:29:02 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/04/08 19:29:02 INFO mapreduce.ImportJobBase: Beginning import of customers
22/04/08 19:29:02 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/08 19:29:02 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/08 19:29:02 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/08 19:29:02 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/08 19:29:03 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/08 19:29:03 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/08 19:29:04 INFO db.DBInputFormat: Using read commited transaction isolation
22/04/08 19:29:04 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`customer_id`), MAX(`customer_id`) FROM `customers` WHERE ( `customer_id` > 0 AND `customer_id` <= 12440 )
22/04/08 19:29:04 INFO db.IntegerSplitter: Split size: 3109; Num splits: 4 from: 1 to: 12440
22/04/08 19:29:04 INFO mapreduce.JobSubmitter: number of splits:4
22/04/08 19:29:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649172504056_0009
22/04/08 19:29:05 INFO impl.YarnClientImpl: Submitted application application_1649172504056_0009
22/04/08 19:29:05 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1649172504056_0009/
22/04/08 19:29:05 INFO mapreduce.Job: Running job: job_1649172504056_0009
22/04/08 19:29:12 INFO mapreduce.Job: Job job_1649172504056_0009 running in uber mode : false
22/04/08 19:29:12 INFO mapreduce.Job:  map 0% reduce 0%
22/04/08 19:29:27 INFO mapreduce.Job:  map 25% reduce 0%
22/04/08 19:29:30 INFO mapreduce.Job:  map 50% reduce 0%
22/04/08 19:29:31 INFO mapreduce.Job:  map 100% reduce 0%
22/04/08 19:29:32 INFO mapreduce.Job: Job job_1649172504056_0009 completed successfully
22/04/08 19:29:32 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=690844
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=487
		HDFS: Number of bytes written=953915
		HDFS: Number of read operations=16
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=8
	Job Counters 
		Launched map tasks=4
		Other local map tasks=4
		Total time spent by all maps in occupied slots (ms)=58628
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=58628
		Total vcore-milliseconds taken by all map tasks=58628
		Total megabyte-milliseconds taken by all map tasks=60035072
	Map-Reduce Framework
		Map input records=12440
		Map output records=12440
		Input split bytes=487
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=808
		CPU time spent (ms)=4370
		Physical memory (bytes) snapshot=542535680
		Virtual memory (bytes) snapshot=6044942336
		Total committed heap usage (bytes)=243007488
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=953915
22/04/08 19:29:32 INFO mapreduce.ImportJobBase: Transferred 931.5576 KB in 29.8469 seconds (31.2112 KB/sec)
22/04/08 19:29:32 INFO mapreduce.ImportJobBase: Retrieved 12440 records.
22/04/08 19:29:32 INFO util.AppendUtils: Creating missing output directory - customers
22/04/08 19:29:32 INFO tool.ImportTool: Saving incremental import state to the metastore
22/04/08 19:29:32 INFO tool.ImportTool: Updated data for job: customer_import_supply_pwd_via_file
[cloudera@quickstart ~]$

That’s it you are done. But storing a database password as plain text is not a recommended. To address this, we need to store the password in encrypted form. I will explain this secured way in my next post .

 


Previous                                                    Next                                                    Home

No comments:

Post a Comment