Hadoop (HDFS) version upgrade without losing data
We have achieved upgrading the version fromHadoop 0.20.205.0 to ==> hadoop 1.0.3
Hbase 0.90.4 to ==> hbase 0.94.1
We have followed the following steps and hope it helps
Before upgrading the HDFS make sure existing cluster is working fine and filesystem is Healthy.
1. Stop all client applications running on the MapReduce cluster.
stop-mapred.sh
2. kill any orphaned task process on the TaskTrackers.
3. Perform a filesystem check:
hadoop fsck / -files -blocks -locations dfs-v-old-fsck-1.log
4. Save a complete listing of the HDFS namespace to a local file.
hadoop dfs -lsr / dfs-v-old-lsr-1.log
5. Create a list of DataNodes participating in the cluster.
hadoop dfsadmin -report dfs-v-old-report-1.log.
6. stop and restart HDFS cluster( To create an checkpoint of the old version)
stop-dfs.sh
start-dfs.sh
7. Before stop the dfs take the backup of the Data Directory specified for storing image and other files of the HDFS
(name specified in conf/hdfs-site.xml for <namedfs.data.dir</name property)
8. stop the hdfs cluster.
stop-dfs.sh
After you have installed the new Hadoop version
1. Change the following files to redirect
conf/slaves , conf/masters, conf/core-site.xml , conf/hdfs-site.xml, conf/mapred-site.xml
2. Start the actual HDFS upgrade process.
hadoop-daemon.sh start namenode –upgrade
3. Check the upgrade process status
hadoop dfsadmin -upgradeProgress status this should give you
Upgrade for version –(new version_no) has been completed.
Upgrade is not finalized.
4. Compare the namespace log by taking the new log.
hadoop dfs -lsr / dfs-v-new-lsr-0.log
Compare it with old
5. Perform a filesystem check
hadoop fsck / -files -blocks -locations dfs-v-new-fsck-1.log
and compare it with old
6. Create list of DataNodes participating in the cluster.
hadoop dfsadmin -report dfs-v-old-report-1.log.
and compare it with old
7. Start the HDFS cluster
start-dfs.sh
8. Start the MapReduce cluster
start-mapred.sh
9. Finalize the upgrade
hadoop dfsadmin –finalizeUpgrade