Проект

Общее

Профиль

Failover management » История » Версия 4

Версия 3 (Владимир Ипатов, 24.10.2012 21:13) → Версия 4/6 (Владимир Ипатов, 24.10.2012 21:13)

.

{{>toc}}

h1. FAILOVER management

designations:
<pre>
gnt# - command exec on master node
gntX# - command exec on ordinary node
gntY# - command exec on other node
# - command exec on any node
</pre>

h2. Start instances on one node where other is down

When node starts and can't find other node, cluster management daemon ganeti-masterd don't start automatically, even on a master-node.
It is because of not able to find out if second node down or there is a link problem when instances on other node is still running.
Cluster management daemon start:
<pre>
gnt# ganeti-masterd --no-voting
</pre>

h2. Master node change in normal mode

Both of the nodes are online, master node changing is in normal mode
On master-candidate (gntX):
<pre>
gntX# gnt-cluster master-failover
</pre>

h2. Failure of master node

Master node(in this example gnt1) is down by hardware failure.

Start management daemon on master-candidate(gntX):
<pre>
gntX# ganeti-masterd --no-voting
</pre>

Activate new master node:
<pre>
gntX# gnt-cluster master-failover --no-voting
</pre>

Set +*Set broken node to offline so master node don't try to connect it. it.*+
-С = master-candidate
-O = offline
<pre>
gnt# gnt-node modify -C no -O yes gntY
</pre>

Start all instances from broken node on backup node:
<pre>
gnt# gnt-node failover --ignore-consistency gnt1
</pre>

h2. Set broken node to online

Old master node will not start management daemon on boot:
* if it will not find other node,
* if it will find other node that is in master mode.

If data on this node is ok then to readd it in claster:
Copy new configuration on it from new master node:
<pre>
gnt# gnt-cluster redist-conf
</pre>

Restart ganeti daemons:
<pre>
gntX# /etc/init.d/ganeti restart
</pre>

h2. Planned node turning off for maintenance

Migrate all instances from this node to another
<pre>
gnt# gnt-instance migrate INSTANCE
</pre>

If turning off node is master then you must to assign new cluster master(see above *Master node change in normal mode*)

Set node to offline and not master-candidate:
<pre>
gnt# gnt-node modify -C no -O yes УЗЕЛ
</pre>

Now you can simply turn off this node

h3. Returning node to online

After boot set node online and master-candidate:
<pre>
gnt# gnt-node modify -C yes -O no УЗЕЛ
</pre>

However, if you have any doubt about node's health, you would rather exec this:
<pre>
gnt# gnt-node add --readd УЗЕЛ
</pre>

Anyway, you must wait about 5 minutes until watcher daemon set up drbd resources or initiate set up by hand:
<pre>
gnt# gnt-cluster verify-disks
</pre>

h2. Node replace to new

Add node to cluster:
<pre>
gnt# gnt-node add --readd gntX
</pre>

For all instances which new node is secondary:
<pre>
gnt# gnt-instance replace-disks --submit -s INSTANCE
</pre>

Readd node to puppet:
<pre>
gnt# gnt-instance console sci
sci# puppetca --clean gnt1.fqdn
</pre>

<pre>
gntX# /var/lib/puppet/ssl/*
gntX# /etc/init.d/puppet restart
</pre>

h2. Hard disk replace

Copy partitions from existent hdd(allowed only for same mode disks):
<pre>
# sfdisk -d /dev/sda|sfdisk /dev/sdX
</pre>
Check:
<pre>
# fdisk -l
</pre>
add to RAID:
<pre>
# mdadm --manage /dev/md0 --add /dev/sdX1
# mdadm --manage /dev/md1 --add /dev/sdX2
# mdadm --manage /dev/md2 --add /dev/sdX3
</pre>
Check:
<pre>
cat /proc/mdstat
</pre>