Tuesday 22 March 2022

Setup Hadoop using cloudera quick start vm

In this post, I am going to explain how to work with Hadoop using cloudera quick start vm. Cloudera quick VM has everything that you need to experiment with Hadoop. I would prefer this installation than setting up everything in my system.

 

Follow below step-by-step procedure to install cloudera quick start vm.

 

Step 1: Download Oracle virtual box and install it.

Go to below location and download the virtual box that is compatible with your operating system.

https://www.virtualbox.org/wiki/Downloads

 

Step 2: Download Cloudera quick start VM bundle from below location.

https://downloads.cloudera.com/demo_vm/virtualbox/cloudera-quickstart-vm-5.13.0-0-virtualbox.zip

 

Unzip the downloaded zip file, it contain following two files.

 

 


Right click on the .ovf file -> Open With -> VirtualBox.

 

 


It opens ‘VirtualBox Manager’ window, configure CPU, RAM according to the needs and click on import button (I went with default options).

 

Once the import is successful, you will see that ‘cloudera quickstart vm’ is available in ‘Oracle VM VirtualBox Manager’.




Click on Start button.

 

Once the vm start successfully, you will be redirected to cloudera vm.

 

 


Open terminal and execute the command ‘ps -eaf | grep hadoop’, you will observe the Hadoop processes.


You can execute the command 'hadoop fs' to list all the available hdfs commands.

[cloudera@quickstart ~]$ hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-x] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] [-x] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

[cloudera@quickstart ~]$




Previous                                                 Next                                                 Home

No comments:

Post a Comment