Saturday, 4 June 2022

How to supply a password securely to a sqoop job?

In this post, I am going to explain how to supply the password securely to a sqoop job.

 

Step 1: Create a credential using ‘hadoop credential create’ command.

 

Open terminal and execute below command.

 

hadoop credential create retailDBPassword -provider jceks://hdfs/user/cloudera/retail.db.password.jceks

[cloudera@quickstart ~]$ hadoop credential create retailDBPassword -provider jceks://hdfs/user/cloudera/retail.db.password.jceks
WARNING: You have accepted the use of the default provider password
by not configuring a password in one of the two following locations:
    * In the environment variable HADOOP_CREDSTORE_PASSWORD
    * In a file referred to by the configuration entry
      hadoop.security.credstore.java-keystore-provider.password-file.
Please review the documentation regarding provider passwords in
the keystore passwords section of the Credential Provider API
Continuing with the default provider password.

Enter alias password: 
Enter alias password again: 
retailDBPassword has been successfully created.
Provider jceks://hdfs/user/cloudera/retail.db.password.jceks has been updated.

 

Above command prompts you for a password. Once the command execution is successful, password is encrypted and persisted to the file ‘retail.db.password.jceks’.

 

Let’s query the file ‘retail.db.password.jceks’ and see whether we can read any data from it or not.

 

Open terminal and execute the command ‘hadoop fs -cat /user/cloudera/retail.db.password.jceks’.

 


 

Step 2: Let’s create a job by supplying the .jceks file.

 

sqoop job \
-Dhadoop.security.credential.provider.path=jceks://hdfs/user/cloudera/retail.db.password.jceks \
--create customer_import_supply_pwd_via_jceks \
-- import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username "root" \
--password-alias retailDBPassword  \
--table "customers" \
--warehouse-dir /job-demo-3 \
--incremental append \
--check-column customer_id \
--last-value 0

 

[cloudera@quickstart ~]$ sqoop job \
> -Dhadoop.security.credential.provider.path=jceks://hdfs/user/cloudera/retail.db.password.jceks \
> --create customer_import_supply_pwd_via_jceks \
> -- import \
> --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
> --username "root" \
> --password-alias retailDBPassword  \
> --table "customers" \
> --warehouse-dir /job-demo-3 \
> --incremental append \
> --check-column customer_id \
> --last-value 0 
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/08 21:29:39 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
[cloudera@quickstart ~]$
[cloudera@quickstart ~]$
[cloudera@quickstart ~]$ sqoop job --list
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/08 21:30:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
Available jobs:
  customer_import
  customer_import_supply_pwd_via_file
  customer_import_supply_pwd_via_jceks

 

Let’s execute the job ‘customer_import_supply_pwd_via_jceks’ by running below command.

sqoop job --exec customer_import_supply_pwd_via_jceks

[cloudera@quickstart ~]$ sqoop job --exec customer_import_supply_pwd_via_jceks
Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
22/04/08 21:30:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.13.0
22/04/08 21:30:57 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
22/04/08 21:30:57 INFO tool.CodeGenTool: Beginning code generation
22/04/08 21:30:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/08 21:30:57 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `customers` AS t LIMIT 1
22/04/08 21:30:57 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-mapreduce
Note: /tmp/sqoop-cloudera/compile/900d8087c6412427dd55246d3c95aa50/customers.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
22/04/08 21:30:59 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/900d8087c6412427dd55246d3c95aa50/customers.jar
22/04/08 21:30:59 INFO tool.ImportTool: Maximal id query for free form incremental import: SELECT MAX(`customer_id`) FROM `customers`
22/04/08 21:30:59 INFO tool.ImportTool: Incremental import based on column `customer_id`
22/04/08 21:30:59 INFO tool.ImportTool: Lower bound value: 0
22/04/08 21:30:59 INFO tool.ImportTool: Upper bound value: 12440
22/04/08 21:30:59 WARN manager.MySQLManager: It looks like you are importing from mysql.
22/04/08 21:30:59 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
22/04/08 21:30:59 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
22/04/08 21:30:59 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
22/04/08 21:30:59 INFO mapreduce.ImportJobBase: Beginning import of customers
22/04/08 21:30:59 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
22/04/08 21:30:59 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
22/04/08 21:30:59 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
22/04/08 21:30:59 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
22/04/08 21:31:00 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/08 21:31:00 INFO db.DBInputFormat: Using read commited transaction isolation
22/04/08 21:31:00 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`customer_id`), MAX(`customer_id`) FROM `customers` WHERE ( `customer_id` > 0 AND `customer_id` <= 12440 )
22/04/08 21:31:00 INFO db.IntegerSplitter: Split size: 3109; Num splits: 4 from: 1 to: 12440
22/04/08 21:31:00 WARN hdfs.DFSClient: Caught exception 
java.lang.InterruptedException
	at java.lang.Object.wait(Native Method)
	at java.lang.Thread.join(Thread.java:1281)
	at java.lang.Thread.join(Thread.java:1355)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:967)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.endBlock(DFSOutputStream.java:705)
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:894)
22/04/08 21:31:00 INFO mapreduce.JobSubmitter: number of splits:4
22/04/08 21:31:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1649172504056_0010
22/04/08 21:31:01 INFO impl.YarnClientImpl: Submitted application application_1649172504056_0010
22/04/08 21:31:01 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1649172504056_0010/
22/04/08 21:31:01 INFO mapreduce.Job: Running job: job_1649172504056_0010
22/04/08 21:31:08 INFO mapreduce.Job: Job job_1649172504056_0010 running in uber mode : false
22/04/08 21:31:08 INFO mapreduce.Job:  map 0% reduce 0%
22/04/08 21:31:24 INFO mapreduce.Job:  map 25% reduce 0%
22/04/08 21:31:27 INFO mapreduce.Job:  map 50% reduce 0%
22/04/08 21:31:29 INFO mapreduce.Job:  map 100% reduce 0%
22/04/08 21:31:30 INFO mapreduce.Job: Job job_1649172504056_0010 completed successfully
22/04/08 21:31:30 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=691652
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=487
		HDFS: Number of bytes written=953915
		HDFS: Number of read operations=16
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=8
	Job Counters 
		Launched map tasks=4
		Other local map tasks=4
		Total time spent by all maps in occupied slots (ms)=59956
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=59956
		Total vcore-milliseconds taken by all map tasks=59956
		Total megabyte-milliseconds taken by all map tasks=61394944
	Map-Reduce Framework
		Map input records=12440
		Map output records=12440
		Input split bytes=487
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=765
		CPU time spent (ms)=4640
		Physical memory (bytes) snapshot=560345088
		Virtual memory (bytes) snapshot=6044925952
		Total committed heap usage (bytes)=243007488
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=953915
22/04/08 21:31:30 INFO mapreduce.ImportJobBase: Transferred 931.5576 KB in 30.6614 seconds (30.3821 KB/sec)
22/04/08 21:31:30 INFO mapreduce.ImportJobBase: Retrieved 12440 records.
22/04/08 21:31:30 INFO util.AppendUtils: Creating missing output directory - customers
22/04/08 21:31:30 INFO tool.ImportTool: Saving incremental import state to the metastore
22/04/08 21:31:30 INFO tool.ImportTool: Updated data for job: customer_import_supply_pwd_via_jceks
[cloudera@quickstart ~]$


 

Previous                                                    Next                                                    Home

No comments:

Post a Comment