Information Technology Journey of Indrajit: Setting up an I/O Fencing

Setting up an I/O Fencing

To provide high availability, the cluster must be capable of taking corrective action when a node fails. In this situation, VCS configures its components to reflect the altered membership.

Problems arise when the mechanism that detects the failure breaks down because symptoms appear identical to those of a failed node. For example, if a system in a two-node cluster fails, the system stops sending heartbeats over the private interconnect and the remaining node takes corrective action. However, the failure of private interconnects (instead of the actual nodes) would present identical symptoms and cause each node to determine its peer has departed.

This situation typically results in data corruption because both nodes attempt to take control of data storage in an uncoordinated manner.

In addition to a broken set of private networks, other scenarios can generate this situation. If a system is so busy that it appears to stop responding or “hang,” the other nodes could declare it as dead. This declaration may also occur for nodes using hardware that supports a “break” and “resume” function. When a node drops to PROM level with a break and subsequently resumes operations, the other nodes may declare the system dead even though the system later returns and begins write operations.

VCS uses a technology called I/O fencing to remove the risk associated with split brain. I/O fencing allows write access for members of the active cluster and blocks access to storage from non-members; even a node that is alive is unable to cause damage.

SCSI-3 persistent reservations

SCSI-3 Persistent Reservations (SCSI-3 PR) are required for I/O fencing and resolve the issues of using SCSI reservations in a clustered SAN environment.

SCSI-3 PR enables access for multiple nodes to a device and simultaneously blocks access for other nodes.

SCSI-3 PR uses a concept of registration and reservation. Each system registers its own “key” with a SCSI-3 device. Multiple systems registering keys form a membership and establish a reservation, typically set to “Write Exclusive Registrants Only.” The WERO setting enables only registered systems to perform write operations. For a given disk, only one reservation can exist amidst numerous registrations.

With SCSI-3 PR technology, blocking write access is as simple as removing a registration from a device. Only registered members can “eject” the registration of another member. A member wishing to eject another member issues a “preempt and abort” command. Ejecting a node is final and atomic; an ejected node cannot eject another node. In VCS, a node registers the same key for all paths to the device. A single preempt and abort command ejects a node from all paths to the storage device.

I/O fencing components

Fencing in VCS involves coordinator disks and data disks. Each component has a unique purpose and uses different physical disk devices. The fencing driver is vxfen.

Data disks

Data disks are standard disk devices for data storage and are either physical disks or RAID Logical Units (LUNs). These disks must support SCSI-3 PR and are part of standard VxVM or CVM disk groups.

CVM is responsible for fencing data disks on a disk group basis. Disks added to a disk group are automatically fenced, as are new paths discovered to a device.

Coordinator disks

Coordinator disks are three standard disks or LUNs set aside for I/O fencing during cluster reconfiguration. Coordinator disks do not serve any other storage purpose in the VCS configuration. Users cannot store data on these disks or include the disks in a disk group for user data. The coordinator disks can be any three disks that support SCSI-3 PR.

Symantec recommends using the smallest possible LUNs for coordinator disks. Because coordinator disks do not store any data, cluster nodes need only register with them and do not need to reserve them.

These disks provide a lock mechanism to determine which nodes get to fence off data drives from other nodes. A node must eject a peer from the coordinator disks before it can fence the peer from the data drives. This concept of racing for control of the coordinator disks to gain the ability to fence data disks is key to understanding prevention of split brain through fencing.

Dynamic Multipathing devices with I/O fencing

DMP allows coordinator disks to take advantage of the path failover and the dynamic adding and removal capabilities of DMP. You can configure coordinator disks to use Veritas Volume Manager Dynamic Multipathing (DMP) feature.

I/O fencing operations

I/O fencing, provided by the kernel-based fencing module (vxfen), performs identically on node \ failures and communications failures. When the fencing module on a node is informed of a change in cluster membership by the GAB module, it immediately begins the fencing operation. The node attempts to eject the key for departed nodes from the coordinator disks using the preempt and abort command. When the node successfully ejects the departed nodes from the coordinator disks, it ejects the departed nodes from the data disks. In a split brain scenario, both sides of the split would race for control of the coordinator disks. The side winning the majority of the coordinator disks wins the race and fences the loser. The loser then panics and reboots the system.

Preparing to configure I/O fencing

Make sure you performed the following tasks before configuring I/O fencing for VCS:

Install the correct operating system.
Install the VRTSvxfen depot when you installed VCS.
Install a version of Veritas Volume Manager (VxVM) that supports SCSI-3 persistent reservations (SCSI-3 PR). i.e, VXVM 4.0 & 5.0

The shared storage that you add for use with VCS software must support SCSI-3 persistent reservations, a functionality that enables the use of I/O fencing.

Step1:

Identify three SCSI-3 PR compliant shared disks as coordinator disks.

List the disks on each node and pick three disks as coordinator disks.

For example, execute the following commands to list the disks:

# ioscan -nfC disk

# insf -e

Step2:

If the Array Support Library (ASL) for the array you are adding is not installed, obtain and install it on each node before proceeding. The ASL for the supported storage device you are adding is available from the disk array vendor or Symantec technical support.

Verify that the ASL for the disk array is installed on each of the nodes. Run the following command on each node and examine the output to verify the installation of ASL. The following output is a sample:

# vxddladm listsupport all

LIBNAME VID

==============================================================

libvxautoraid.sl HP

libvxCLARiiON.sl DGC

libvxemc.sl EMC

Step3:

Verifying the nodes see the same disk by vxfenadm

To confirm whether a disk (or LUN) supports SCSI-3 persistent reservations, two nodes must simultaneously have access to the same disks. Because a shared disk is likely to have a different name on each node, check the serial number to verify the identity of the disk. Use the vxfenadm command with the -i option to verify that the same serial number for the LUN is returned on all paths to the LUN.

For example, an EMC disk is accessible by the /dev/rdsk/c2t13d0 path on node A and by the /dev/rdsk/c2t11d0 path on node B.

Preparing to configure I/O fencing

From node A, enter:

# vxfenadm -i /dev/rdsk/c2t13d0

Vendor id : EMC

Product id : SYMMETRIX

Revision : 5567

Serial Number : 42031000a

The same serial number information should appear when you enter the equivalent command on node B using the /dev/rdsk/c2t11d0 path.

Step4:

Check the disk whether they are supported by SCSCI3-PR & I/O Fencing by vxfentsthdw script

Testing the disks using vxfentsthdw script

This procedure uses the /dev/rdsk/c2t13d0 disk in the steps.

If the utility does not show a message stating a disk is ready, verification has failed. Failure of verification can be the result of an improperly configured disk array. It can also be caused by a bad disk.

If the failure is due to a bad disk, remove and replace it. The vxfentsthdw utility indicates a disk can be used for I/O fencing with a message resembling:

The disk /dev/rdsk/c2t13d0 is ready to be configured for I/O Fencing on node coesun1

To test disks using vxfentsthdw script.

Make sure system-to-system communication is functioning properly.
From one node, start the utility. Do one of the following:

If you use ssh for communication:

# /opt/VRTSvcs/vxfen/bin/vxfentsthdw

If you use remsh for communication:

# /opt/VRTSvcs/vxfen/bin/vxfentsthdw –n

After reviewing the overview and warning that the tests overwrite data on the disks, confirm to continue the process and enter the node names.

******** WARNING!!!!!!!! ********

THIS UTILITY WILL DESTROY THE DATA ON THE DISK!!

Do you still want to continue : [y/n] (default: n) y

Enter the first node of the cluster: coesun1

Enter the second node of the cluster: coesun2

Enter the names of the disks you are checking. For each node, the same disk may be known by a different name:

Enter the disk name to be checked for SCSI-3 PGR on node north in

the format: /dev/rdsk/cxtxdx

/dev/rdsk/c2t13d0

Enter the disk name to be checked for SCSI-3 PGR on node south in

the format: /dev/rdsk/cxtxdx

Make sure it’s the same disk as seen by nodes north and south

/dev/rdsk/c2t13d0

If the disk names are not identical, then the test terminates.

Review the output as the utility performs the checks and report its activities.
If a disk is ready for I/O fencing on each node, the utility reports success:

The disk is now ready to be configured for I/O Fencing on node coesun1

ALL tests on the disk /dev/rdsk/c2t13d0 have PASSED

The disk is now ready to be configured for I/O Fencing on node coesun1

Run the vxfentsthdw utility for each disk you intend to verify the SCSCI3 PR Support for disk (Coordiantor and data), using

# vxfentsthdw utility with –r option, which is used as read only

# vxfentsthdw –r –g temp_DG

Step5:

To initialize the disks as VxVM disks, use one of the following methods:

Use the interactive vxdiskadm utility to initialize the disks as VxVM disks.
Use the vxdisksetup command to initialize a disk as a VxVM disk.

#vxdisksetup -i device_name format=cdsdisk

The example specifies the CDS format:

# vxdisksetup -i c1t1d0 format=cdsdisk

# vxdisksetup –i c2t1d0 format=cdsdisk

# vxdisksetup –I c3t1d0 format=cdsdisk

Requirements for coordinator disks

After adding and initializing disks for use as coordinator disks, make sure coordinator disks meet the following requirements:

You must have three coordinator disks.
Each of the coordinator disks must use a physically separate disk or LUN.
Each of the coordinator disks should exist on a different disk array, if possible.
You must initialize each disk as a VxVM disk.
The coordinator disks must support SCSI-3 persistent reservations.
The coordinator disks must exist in a disk group (for example, vxfencoorddg).
Symantec recommends using hardware-based mirroring for coordinator disks.

To create the vxfencoorddg disk group

On any node, create the disk group by specifying the device name of the disks:

# vxdg -o coordinator=on init vxfencoorddg c1t1d0 c2t1d0 c3t1d0

Before configuring the coordinator disk for use, you must stop VCS on all nodes.

To stop VCS on all nodes

On one node, enter: \

# hastop –all

Configuring /etc/vxfendg disk group for I/O fencing

After setting up the coordinator disk group, configure it for use.

To configure the disk group for fencing

Deport the disk group:

# vxdg deport vxfencoorddg

Import the disk group with the -t option to avoid automatically importing it when the nodes restart:

# vxdg -t import vxfencoorddg

Deport the disk group. Deporting the disk group prevents the coordinator disks from serving other purposes:

# vxdg deport vxfencoorddg

On all nodes, type:

# echo "vxfencoorddg" > /etc/vxfendg

Do not use spaces between the quotes in the “vxfencoorddg” text.

This command creates the /etc/vxfendg file, which includes the name of the coordinator disk group.

Based on the contents of the /etc/vxfendg and /etc/vxfenmode files, the rc script creates the /etc/vxfentab file for use by the vxfen driver when the system starts. The rc script also invokes the vxfenconfig command, which configures the vxfen driver to start and use the coordinator disks listed in /etc/vxfentab. The /etc/vxfentab file is a generated file; do not modify this file.

Example /etc/vxfentab file

The /etc/vxfentab file gets created when you start the I/O fencing driver.

An example of the /etc/vxfentab file on one node resembles:

§ Raw disk

/dev/rdsk/c1t1d0

/dev/rdsk/c2t1d0

/dev/rdsk/c3t1d0

§ DMP disk

/dev/vx/rdmp/c1t1d0

/dev/vx/rdmp/c2t1d0

/dev/vx/rdmp/c3t1d0

In some cases you must remove disks from or add disks to an existing coordinator disk group.

Updating /etc/vxfenmode file

You must update the /etc/vxfenmode file to operate in SCSI-3 mode. You can configure the vxfen module to use either DMP devices or the underlying raw character devices. Note that you must use the same SCSI-3 disk policy, either raw or dmp, on all the nodes.

To update /etc/vxfenmode file

On all cluster nodes, depending on the SCSI-3 mechanism you have chosen, type:
For DMP configuration:

#cp /etc/vxfen.d/vxfenmode_scsi3_dmp /etc/vxfenmode

For raw device configuration:

#cp /etc/vxfen.d/vxfenmode_scsi3_raw /etc/vxfenmode

Starting I/O fencing

You now need to start I/O fencing on each node. VxFEN, the I/O fencing driver, may already be running, so you need to restart the driver for the new configuration to take effect.

To stop I/O fencing on a node

Stop the I/O fencing driver.

# /sbin/init.d/vxfen stop

To start I/O fencing on a node

Start the I/O fencing driver.

# /sbin/init.d/vxfen start

Modifying VCS configuration to use I/O fencing

After adding coordinator disks and configuring I/O fencing, add the UseFence = SCSI3 cluster attribute to the VCS configuration file, /etc/VRTSvcs/conf/config/main.cf. If you reset this attribute to UseFence = None, VCS does not make use of I/O fencing abilities while failing over service groups. However, I/O fencing needs to be disabled separately.

To modify VCS configuration to enable I/O fencing

Save the existing configuration:

# haconf -dump –makero

Stop VCS on all nodes:

# hastop –all

Make a backup copy of the main.cf file:

# cd /etc/VRTSvcs/conf/config

# cp main.cf main.orig

On one node, use vi or another text editor to edit the main.cf file. Modify the list of cluster attributes by adding the UseFence attribute and assigning its value of SCSI3.

cluster rac_cluster101

UserNames = { admin = "cDRpdxPmHpzS." }

Administrators = { admin }

HacliUserLevel = COMMANDROOT

CounterInterval = 5

UseFence = SCSI3

)

Save and close the file.
Verify the syntax of the file /etc/VRTSvcs/conf/config/main.cf:

# hacf -verify /etc/VRTSvcs/conf/config

Using rcp or another utility, copy the VCS configuration file from a node (for example, north) to the remaining cluster nodes.

For example, on each remaining node, enter:

# rcp north:/etc/VRTSvcs/conf/config/main.cf /etc/VRTSvcs/conf/config

On each node enter the following sequence of commands. These commands brings up VCS processes:

# /opt/VRTS/bin/hastart

Verifying I/O fencing configuration

Verify from the vxfenadm output that the SCSI-3 disk policy reflects the configuration in the /etc/vxfenmode file.

To verify I/O fencing configuration

On one of the nodes, type:

# vxfenadm -d

I/O Fencing Cluster Information:

================================

Fencing Protocol Version: 201

Fencing Mode: SCSI3

Fencing SCSI3 Disk Policy: raw

Cluster Members:

* 0 (north)

1 (south)

RFSM State Information:

node 0 in state 8 (running)

node 1 in state 8 (running)

Thank you for reading.

For Reading other article, visit to “https://sites.google.com/site/unixwikis/”

Information Technology Journey of Indrajit

Search This Blog

Monday, September 15, 2014

Setting up an I/O Fencing

Setting up an I/O Fencing

SCSI-3 persistent reservations

I/O fencing components

Data disks

Coordinator disks

Dynamic Multipathing devices with I/O fencing

I/O fencing operations

Preparing to configure I/O fencing

Step5:

Requirements for coordinator disks

To create the vxfencoorddg disk group

Configuring /etc/vxfendg disk group for I/O fencing

Updating /etc/vxfenmode file

Starting I/O fencing

Modifying VCS configuration to use I/O fencing

Verifying I/O fencing configuration

No comments:

Post a Comment