.
- FAILOVER management
FAILOVER management¶
designations:
gnt# - command exec on master node gntX# - command exec on ordinary node gntY# - command exec on other node # - command exec on any node
Start instances on one (master) node where other is down¶
When node starts and can't find other node, cluster management daemon ganeti-masterd don't start automatically, even on a master-node.
It is because of not able to find out if second node down or there is a link problem when instances on other node is still running.
Cluster management daemon start:¶
gnt# ganeti-masterd --no-voting
Transfer instances from faled (or offline node)¶
This operation is activated only once. It will change the operation node for the instance.
The transferred instances will be automatically started if desired by default.
Better is to migrate or failover the instances in regular mode, before their node will go offline.
But if it happens accidentally, then you should issue:
gnt# gnt-node failover --ignore-consistency gntX
Master node change in normal mode¶
Both of the nodes are online, master node changing is in normal mode
On master-candidate (gntX):
gntX# gnt-cluster master-failover
Failure of master node¶
Master node(in this example gnt1) is down by hardware failure.
Start management daemon on master-candidate(gntX):
gntX# ganeti-masterd --no-voting
Activate new master node:
gntX# gnt-cluster master-failover --no-voting
Set broken node to offline so master node don't try to connect it.
-С = master-candidate
-O = offline
gnt# gnt-node modify -C no -O yes gntY
Start all instances from broken node on backup node:
gnt# gnt-node failover --ignore-consistency gnt1
Set broken node to online¶
Old master node will not start management daemon on boot:- if it will not find other node,
- if it will find other node that is in master mode.
If data on this node is ok then to readd it in claster:
Copy new configuration on it from new master node:
gnt# gnt-cluster redist-conf
Restart ganeti daemons:
gntX# /etc/init.d/ganeti restart
Planned node turning off for maintenance¶
Migrate all instances from this node to another
gnt# gnt-instance migrate INSTANCE
If turning off node is master then you must to assign new cluster master(see above Master node change in normal mode)
Set node to offline and not master-candidate:
gnt# gnt-node modify -C no -O yes УЗЕЛ
Now you can simply turn off this node
Returning node to online¶
After boot set node online and master-candidate:
gnt# gnt-node modify -C yes -O no УЗЕЛ
However, if you have any doubt about node's health, you would rather exec this:
gnt# gnt-node add --readd УЗЕЛ
Anyway, you must wait about 5 minutes until watcher daemon set up drbd resources or initiate set up by hand:
gnt# gnt-cluster verify-disks
Node replace to new¶
Add node to cluster:
gnt# gnt-node add --readd gntX
For all instances which new node is secondary:
gnt# gnt-instance replace-disks --submit -s INSTANCE
Readd node to puppet:
gnt# gnt-instance console sci sci# puppetca --clean gnt1.fqdn
gntX# /var/lib/puppet/ssl/* gntX# /etc/init.d/puppet restart
Hard disk replace¶
Copy partitions from existent hdd(allowed only for same mode disks):
# sfdisk -d /dev/sda|sfdisk /dev/sdX
Check:
# fdisk -l
add to RAID:
# mdadm --manage /dev/md0 --add /dev/sdX1 # mdadm --manage /dev/md1 --add /dev/sdX2 # mdadm --manage /dev/md2 --add /dev/sdX3
Check:
cat /proc/mdstat