Setting up an I/O Fencing
To provide high availability, the cluster must be capable
of taking corrective action when a node fails. In this situation, VCS
configures its components to reflect the altered membership.
Problems arise when the mechanism that detects the failure
breaks down because symptoms appear identical to those of a failed node. For
example, if a system in a two-node cluster fails, the system stops sending heartbeats
over the private interconnect and the remaining node takes corrective action.
However, the failure of private interconnects (instead of the actual nodes)
would present identical symptoms and cause each node to determine its peer has
departed.
This situation typically results in data corruption
because both nodes attempt to take control of data storage in an uncoordinated
manner.
In addition to a broken set of private networks, other scenarios
can generate this situation. If a system is so busy that it appears to stop
responding or “hang,” the other nodes could declare it as dead. This
declaration may also occur for nodes using hardware that supports a “break” and
“resume” function. When a node drops to PROM level with a break and
subsequently resumes operations, the other nodes may declare the system dead
even though the system later returns and begins write operations.
VCS uses a technology called I/O fencing to remove the
risk associated with split brain. I/O fencing allows write access for members
of the active cluster and blocks access to storage from non-members; even a
node that is alive is unable to cause damage.
SCSI-3 persistent reservations
SCSI-3 Persistent Reservations (SCSI-3 PR) are required
for I/O fencing and resolve the issues of using SCSI reservations in a
clustered SAN environment.
SCSI-3 PR enables access for multiple nodes to a device
and simultaneously blocks access for other nodes.
SCSI-3 PR uses a concept of registration and reservation.
Each system registers its own “key” with a SCSI-3 device. Multiple systems
registering keys form a membership and establish a reservation, typically set
to “Write Exclusive Registrants Only.” The WERO setting enables only registered
systems to perform write operations. For a given disk, only one reservation can
exist amidst numerous registrations.
With SCSI-3 PR technology, blocking write access is as
simple as removing a registration from a device. Only registered members can
“eject” the registration of another member. A member wishing to eject another
member issues a “preempt and abort” command. Ejecting a node is final and
atomic; an ejected node cannot eject another node. In VCS, a node registers the
same key for all paths to the device. A single preempt and abort command ejects
a node from all paths to the storage device.
I/O fencing components
Fencing in VCS involves coordinator disks and data disks.
Each component has a unique purpose and uses different physical disk devices. The fencing driver is vxfen.
Data disks
Data disks are standard disk devices for data storage and
are either physical disks or RAID Logical Units (LUNs). These disks must
support SCSI-3 PR and are part of standard VxVM or CVM disk groups.
CVM is responsible for fencing data disks on a disk group
basis. Disks added to a disk group are automatically fenced, as are new paths
discovered to a device.
Coordinator disks
Coordinator disks are three standard disks or LUNs set
aside for I/O fencing during cluster reconfiguration. Coordinator disks do not
serve any other storage purpose in the VCS configuration. Users cannot store
data on these disks or include the disks in a disk group for user data. The
coordinator disks can be any three disks that support SCSI-3 PR.
Symantec recommends using the smallest possible LUNs for
coordinator disks. Because coordinator disks do not store any data, cluster
nodes need only register with them and do not need to reserve them.
These disks provide a lock mechanism to determine which
nodes get to fence off data drives from other nodes. A node must eject a peer
from the coordinator disks before it can fence the peer from the data drives.
This concept of racing for control of the coordinator disks to gain the ability
to fence data disks is key to understanding prevention of split brain through
fencing.
Dynamic Multipathing devices with I/O fencing
DMP allows coordinator disks to take advantage of the path
failover and the dynamic adding and removal capabilities of DMP. You can configure
coordinator disks to use Veritas Volume Manager Dynamic Multipathing (DMP)
feature.
I/O fencing operations
I/O fencing, provided by the kernel-based fencing module
(vxfen), performs identically on node \ failures and communications failures.
When the fencing module on a node is informed of a change in cluster membership
by the GAB module, it immediately begins the fencing operation. The node
attempts to eject the key for departed nodes from the coordinator disks using the
preempt and abort command. When the node successfully ejects the departed nodes
from the coordinator disks, it ejects the departed nodes from the data disks.
In a split brain scenario, both sides of the split would race for control of
the coordinator disks. The side winning the majority of the coordinator disks
wins the race and fences the loser. The loser then panics and reboots the
system.
Preparing to configure I/O fencing
Make sure you performed the following tasks before
configuring I/O fencing for VCS:
- Install the correct
operating system.
- Install the VRTSvxfen
depot when you installed VCS.
- Install a version of
Veritas Volume Manager (VxVM) that supports SCSI-3 persistent reservations
(SCSI-3 PR). i.e, VXVM 4.0 & 5.0
The shared storage that you add for use with VCS software
must support SCSI-3 persistent reservations, a functionality that enables the
use of I/O fencing.
Step1:
Identify three SCSI-3 PR compliant shared disks as
coordinator disks.
List the disks on each node and pick three disks as coordinator
disks.
For example, execute the following commands to list the
disks:
# ioscan
-nfC disk
# insf -e
|
Step2:
If the Array Support Library (ASL) for the array you are
adding is not installed, obtain and install it on each node before proceeding.
The ASL for the supported storage device you are adding is available from the
disk array vendor or Symantec technical support.
Verify that the ASL for the disk array is installed on
each of the nodes. Run the following command on each node and examine the output
to verify the installation of ASL. The following output is a sample:
#
vxddladm listsupport all
LIBNAME
VID
==============================================================
libvxautoraid.sl
HP
libvxCLARiiON.sl
DGC
libvxemc.sl EMC
|
Step3:
Verifying the nodes see the same disk by vxfenadm
To confirm whether a disk (or LUN) supports SCSI-3
persistent reservations, two nodes must
simultaneously have access to the same disks. Because a shared disk is
likely to have a different name on each node, check the serial number to verify the identity of the disk. Use the
vxfenadm command with the -i option to
verify that the same serial number for the LUN is returned on all paths to
the LUN.
For example, an EMC
disk is accessible by the /dev/rdsk/c2t13d0 path on node A and by the
/dev/rdsk/c2t11d0 path on node B.
Preparing
to configure I/O fencing
From node A, enter:
# vxfenadm
-i /dev/rdsk/c2t13d0
Vendor id :
EMC
Product id :
SYMMETRIX
Revision :
5567
Serial Number : 42031000a
The same serial number information should appear
when you enter the equivalent command on node B using the /dev/rdsk/c2t11d0
path.
|
Step4:
Check the disk
whether they are supported by SCSCI3-PR & I/O Fencing by vxfentsthdw script
Testing
the disks using vxfentsthdw script
This
procedure uses the /dev/rdsk/c2t13d0 disk in the steps.
If the utility does not show a message stating a disk is
ready, verification has failed. Failure of verification can be the result of an
improperly configured disk array. It can also be caused by a bad disk.
If the failure is due to a bad disk, remove and replace
it. The vxfentsthdw utility indicates a disk can be used for I/O fencing with a
message resembling:
The
disk /dev/rdsk/c2t13d0 is ready to be configured for I/O Fencing on node
coesun1
To test disks using vxfentsthdw script.
- Make
sure system-to-system communication is functioning properly.
- From
one node, start the utility. Do one of the following:
If
you use ssh for communication:
# /opt/VRTSvcs/vxfen/bin/vxfentsthdw
If you use remsh for communication:
# /opt/VRTSvcs/vxfen/bin/vxfentsthdw
–n
|
- After
reviewing the overview and warning that the tests overwrite data on the
disks, confirm to continue the process and enter the node names.
******** WARNING!!!!!!!! ********
THIS UTILITY WILL DESTROY THE DATA ON THE DISK!!
Do you still want to continue : [y/n] (default: n)
y
Enter the first node of the cluster: coesun1
Enter the second node of the cluster: coesun2
|
- Enter
the names of the disks you are checking. For each node, the same disk may
be known by a different name:
Enter the disk name to be checked for
SCSI-3 PGR on node north in
the format: /dev/rdsk/cxtxdx
/dev/rdsk/c2t13d0
Enter the disk name to be checked for
SCSI-3 PGR on node south in
the format: /dev/rdsk/cxtxdx
Make sure it’s the same disk as seen by
nodes north and south
/dev/rdsk/c2t13d0
If the disk names are not identical, then
the test terminates.
|
- Review the output as the
utility performs the checks and report its activities.
- If a disk is ready for I/O
fencing on each node, the utility reports success:
The
disk is now ready to be configured for I/O Fencing on node coesun1
ALL
tests on the disk /dev/rdsk/c2t13d0 have PASSED
The
disk is now ready to be configured for I/O Fencing on node coesun1
|
- Run
the vxfentsthdw utility for each
disk you intend to verify
the SCSCI3 PR Support for disk (Coordiantor and data), using
#
vxfentsthdw utility with –r option, which is used as read only
#
vxfentsthdw –r –g temp_DG
|
Step5:
To initialize the
disks as VxVM disks, use one of the following methods:
- Use the interactive vxdiskadm utility to initialize
the disks as VxVM disks.
- Use the vxdisksetup command to initialize
a disk as a VxVM disk.
#vxdisksetup
-i device_name format=cdsdisk
The example specifies the CDS format:
# vxdisksetup
-i c1t1d0 format=cdsdisk
#
vxdisksetup –i c2t1d0 format=cdsdisk
#
vxdisksetup –I c3t1d0 format=cdsdisk
|
Requirements for coordinator disks
After adding and initializing
disks for use as coordinator disks, make sure coordinator disks meet the
following requirements:
- You must have three coordinator disks.
- Each of the coordinator disks must use a
physically separate disk or LUN.
- Each of the coordinator disks should
exist on a different disk array, if possible.
- You must initialize each disk as a VxVM
disk.
- The coordinator disks must support SCSI-3
persistent reservations.
- The coordinator disks must exist in a
disk group (for example, vxfencoorddg).
- Symantec recommends using hardware-based
mirroring for coordinator disks.
To create the vxfencoorddg disk group
- On
any node, create the disk group by specifying the device name of the disks:
# vxdg -o coordinator=on init vxfencoorddg
c1t1d0 c2t1d0 c3t1d0
|
Before configuring
the coordinator disk for use, you must stop VCS on all nodes.
To
stop VCS on all nodes
- On one node, enter: \
# hastop
–all
|
Configuring /etc/vxfendg disk group for I/O fencing
After setting up the coordinator disk group, configure it
for use.
To configure the disk group for fencing
- Deport
the disk group:
#
vxdg deport vxfencoorddg
|
- Import
the disk group with the -t option to avoid automatically importing it when
the nodes restart:
#
vxdg -t import vxfencoorddg
|
- Deport
the disk group. Deporting the disk group prevents the coordinator disks
from serving other purposes:
#
vxdg deport vxfencoorddg
|
- On
all nodes, type:
#
echo "vxfencoorddg" >
/etc/vxfendg
|
Do not use spaces between the quotes in the “vxfencoorddg”
text.
This command creates the /etc/vxfendg file, which includes the name of the coordinator disk
group.
Based on the contents of the /etc/vxfendg and /etc/vxfenmode
files, the rc script creates the /etc/vxfentab file for use by the vxfen
driver when the system starts. The rc
script also invokes the vxfenconfig
command, which configures the vxfen driver to start and use the coordinator
disks listed in /etc/vxfentab. The /etc/vxfentab file is a generated file;
do not modify this file.
Example
/etc/vxfentab file
The /etc/vxfentab file gets created when
you start the I/O fencing driver.
An
example of the /etc/vxfentab file on
one node resembles:
§
Raw
disk
/dev/rdsk/c1t1d0
/dev/rdsk/c2t1d0
/dev/rdsk/c3t1d0
§
DMP
disk
/dev/vx/rdmp/c1t1d0
/dev/vx/rdmp/c2t1d0
/dev/vx/rdmp/c3t1d0
|
In some cases you must remove disks from or add disks to
an existing coordinator disk group.
Updating /etc/vxfenmode file
You must update the /etc/vxfenmode
file to operate in SCSI-3 mode. You can configure the vxfen module to use
either DMP devices or the underlying raw character devices. Note that you must
use the same SCSI-3 disk policy, either raw or dmp, on all the nodes.
To update /etc/vxfenmode
file
- On all cluster nodes, depending on the
SCSI-3 mechanism you have chosen, type:
- For
DMP configuration:
#cp
/etc/vxfen.d/vxfenmode_scsi3_dmp /etc/vxfenmode
|
- For
raw device configuration:
#cp
/etc/vxfen.d/vxfenmode_scsi3_raw /etc/vxfenmode
|
Starting I/O fencing
You now need to start I/O fencing on each node. VxFEN, the
I/O fencing driver, may already be running, so you need to restart the driver
for the new configuration to take effect.
To stop I/O fencing on a node
- Stop
the I/O fencing driver.
#
/sbin/init.d/vxfen stop
|
To start I/O fencing on a node
- Start
the I/O fencing driver.
#
/sbin/init.d/vxfen start
|
Modifying VCS configuration to use I/O fencing
After adding coordinator disks and configuring I/O
fencing, add the UseFence = SCSI3
cluster attribute to the VCS configuration file, /etc/VRTSvcs/conf/config/main.cf. If you reset this attribute to UseFence = None, VCS does not make use of I/O fencing abilities while failing over
service groups. However, I/O fencing needs to be disabled separately.
To modify VCS configuration to enable I/O fencing
- Save
the existing configuration:
# haconf -dump –makero
|
- Stop
VCS on all nodes:
# hastop –all
|
- Make
a backup copy of the main.cf file:
# cd /etc/VRTSvcs/conf/config
# cp main.cf main.orig
|
- On
one node, use vi or another text editor to edit the main.cf file. Modify
the list of cluster attributes by adding the UseFence attribute and
assigning its value of SCSI3.
cluster rac_cluster101
UserNames = { admin = "cDRpdxPmHpzS." }
Administrators = { admin }
HacliUserLevel = COMMANDROOT
CounterInterval = 5
UseFence = SCSI3
)
|
- Save
and close the file.
- Verify
the syntax of the file /etc/VRTSvcs/conf/config/main.cf:
# hacf -verify /etc/VRTSvcs/conf/config
|
- Using
rcp or another utility, copy the VCS configuration file from a node (for example,
north) to the remaining cluster nodes.
For example, on each remaining node, enter:
#
rcp north:/etc/VRTSvcs/conf/config/main.cf /etc/VRTSvcs/conf/config
|
- On
each node enter the following sequence of commands. These commands brings
up VCS processes:
# /opt/VRTS/bin/hastart
|
Verifying I/O fencing configuration
Verify from the vxfenadm output that the SCSI-3 disk
policy reflects the configuration in the /etc/vxfenmode file.
To
verify I/O fencing configuration
- On one of the nodes, type:
#
vxfenadm -d
I/O
Fencing Cluster Information:
================================
Fencing
Protocol Version: 201
Fencing
Mode: SCSI3
Fencing
SCSI3 Disk Policy: raw
Cluster
Members:
*
0 (north)
1
(south)
RFSM
State Information:
node
0 in state 8 (running)
node
1 in state 8 (running)
|
Thank you for reading.
For Reading other article, visit
to “https://sites.google.com/site/unixwikis/ ”
No comments:
Post a Comment