User:Lindenb/Notebook/UMR915/20110610: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
Line 1: Line 1:
{{PLNB|20110517|20110610}}
{{PLNB|20110517|20110614}}
=Hadoop=
=Hadoop=
download & unzip '''hadoop-0.20.203.0rc1.tar.gz'''
download & unzip '''hadoop-0.20.203.0rc1.tar.gz'''

Revision as of 11:50, 14 June 2011

20110517        Top        20110614       


Hadoop

download & unzip hadoop-0.20.203.0rc1.tar.gz

Single node setup

http://hadoop.apache.org/common/docs/current/single_node_setup.html

export JAVA_HOME=/usr/local/package/jdk1.6.0_26
cd hadoop-0.20.203.0
mkdir input
cp conf/*.xml input
bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
11/06/10 10:59:36 INFO mapred.JobClient: Cleaning up the staging area file:/tmp/hadoop-lindenb/mapred/staging/lindenb-1423012718/.staging/job_local_0001
java.net.UnknownHostException: srv-clc-04.u915.irt.univ-nantes.prive3: srv-clc-04.u915.irt.univ-nantes.prive3
	at java.net.InetAddress.getLocalHost(InetAddress.java:1354)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:815)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
	at java.security.AccessController.doPrivileged(Native Method)

change congig

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>


conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>


conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

Setup ssh for no password


$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

##Important, change chmod for ssh #############################################

$ chmod 700 ~/.ssh/
$ chmod 640 ~/.ssh/authorized_keys

Format a new distributed-filesystem

$ bin/hadoop namenode -format
[lindenb@srv-clc-04 hadoop-0.20.203.0]$ bin/hadoop namenode -format
11/06/10 12:19:03 INFO namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = java.net.UnknownHostException: srv-clc-04.u915.irt.univ-nantes.prive3: srv-clc-04.u915.irt.univ-nantes.prive3
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.203.0
STARTUP_MSG:   build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
************************************************************/
Re-format filesystem in /tmp/hadoop-lindenb/dfs/name ? (Y or N) Y
11/06/10 12:19:07 INFO util.GSet: VM type       = 64-bit
11/06/10 12:19:07 INFO util.GSet: 2% max memory = 19.1675 MB
11/06/10 12:19:07 INFO util.GSet: capacity      = 2^21 = 2097152 entries
11/06/10 12:19:07 INFO util.GSet: recommended=2097152, actual=2097152
11/06/10 12:19:07 INFO namenode.FSNamesystem: fsOwner=lindenb
11/06/10 12:19:07 INFO namenode.FSNamesystem: supergroup=supergroup
11/06/10 12:19:07 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/06/10 12:19:07 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
11/06/10 12:19:07 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
11/06/10 12:19:07 INFO namenode.NameNode: Caching file names occuring more than 10 times 
11/06/10 12:19:07 INFO common.Storage: Image file of size 113 saved in 0 seconds.
11/06/10 12:19:08 INFO common.Storage: Storage directory /tmp/hadoop-lindenb/dfs/name has been successfully formatted.
11/06/10 12:19:08 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at java.net.UnknownHostException: srv-clc-04.u915.irt.univ-nantes.prive3: srv-clc-04.u915.irt.univ-nantes.prive3
                                                                                                                        • /

start the server

[lindenb@srv-clc-04 hadoop-0.20.203.0]$ ./bin/start-all.sh 
namenode running as process 23788. Stop it first.
localhost: starting datanode, logging to /home/lindenb/package/hadoop-0.20.203.0/bin/../logs/hadoop-lindenb-datanode-srv-clc-04.u915.irt.univ-nantes.prive3.out
localhost: starting secondarynamenode, logging to /home/lindenb/package/hadoop-0.20.203.0/bin/../logs/hadoop-lindenb-secondarynamenode-srv-clc-04.u915.irt.univ-nantes.prive3.out
starting jobtracker, logging to /home/lindenb/package/hadoop-0.20.203.0/bin/../logs/hadoop-lindenb-jobtracker-srv-clc-04.u915.irt.univ-nantes.prive3.out
localhost: starting tasktracker, logging to /home/lindenb/package/hadoop-0.20.203.0/bin/../logs/hadoop-lindenb-tasktracker-srv-clc-04.u915.irt.univ-nantes.prive3.out

copy cdina's data

server1:

scp Axiom_GW_Hu_SNP.r2.na31.annot.csv lindenb@172.18.254.164:
The authenticity of host '172.18.254.164 (172.18.254.164)' can't be established.
RSA key fingerprint is ad:67:03:8d:c1:20:d6:70:04:aa:c2:c8:9b:26:62:8f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '172.18.254.164' (RSA) to the list of known hosts.
lindenb@172.18.254.164's password: 
Axiom_GW_Hu_SNP.r2.na31.annot.csv             100%  765MB  40.3MB/s   00:19

Create a directory on HDFS:

 bin/hadoop fs  -mkdir myfolder