UNIX Journey of Indrajit: 01/01/2014

Friday, January 31, 2014

Intermittent disk failures

Intermittent disk failures

Intermittent disk failures are failures that occur off and on and involve problems that cannot be consistently reproduced. Therefore, these types of failures are the most difficult for the operating system to handle and can cause the system to slow down considerably while the operating system attempts to determine the nature of the problem. If you encounter intermittent failures, you should move data off of the disk and remove the disk from the system to avoid an unexpected failure later. However, intermittent disk failures are also very rare. With intermittent disk failures, you can sometimes observe disks being labelled by VxVM as failing as shown on the slide. If Volume Manager experiences occasional I/O failures on a disk but can still access the private region of the disk, it marks the disk as failing.

Note: If the failing flag is set on a disk, it is not turned off until the administrator executes the following command:

# vxedit -g diskgroup set failing=off dm_name

1. List the disks under Veritas Volume Manager control to determine which disk is marked bad:

# vxdisk list

DEVICE TYPE DISK GROUP STATUS

c0t1d0s2 sliced disk01 rootdg online failing

c0t3d0s2 sliced rootdisk rootdg online

........(more)

2. Clear the failing flag for the disk that is marked as failing:

# vxedit -g rootdg set failing=off disk01

3. Verify that the flag has been cleared:

# vxdisk list - to verify that the flag has been changed

DEVICE TYPE DISK GROUP STATUS

c0t1d0s2 sliced disk01 rootdg online

c0t3d0s2 sliced rootdisk rootdg online

........(more)

Checking Free Memory Slot in HP Unix Host

#echo "selclass qualifier memory;info;wait;infolog" | /usr/sbin/cstm

or

root> cstm
cstm>map
(2 is the dev num for my memory so I will select 2 in next step)
cstm>sel dev 2
cstm>info
cstm>il

or

stm

Preventing users from logging in

If you want to prevent users from logging in to the system, but don’t want to change the runlevel to single user mode, there is another choice to do this. In the file /etc/default/security there is a variable called NOLOGIN. If you change it to 1 – practically, this means uncommenting that line – you will have a means to avoid new user to log in. If it is set to 1, every application that use session management with pam_hpsec (like ssh) will check the presence of /etc/nologin. If the file /etc/nologin exists on the system, no more users will be able to login to the system, every user attempting to login will be presented with the contents of that file. Of course root is immune to this, so you can’t lock out yourself from the system. You can do e.g. this:

# echo ?System Maintenance until 4am - logins disallowed? > /etc/nologin

This is also the way the shutdown process works. If you reboot the system, this file will be automatically erased, no matter if you made it manually or it was created by a shutdown process.

lvcreate can return the error: "Argument out of domain

PROBLEM
lvsplit and lvcreate can return the error: "Argument out of domain".
How to resolve this message?
Example 1:

# lvsplit /dev/vg01/lvol4
lvsplit: The logical volume "/dev/vg01/lvol4b" could not be created:
Argument out of domain
lvsplit: Couldn't delete logical volume "/dev/vg01/lvol4b":
The supplied lv number refers to a non-existent logical volume.

Example 2:

# lvcreate -L 528 -n lvol5 /dev/vg05
lvcreate: the logical volume /dev/vg01/lvol05 could not be created:
Argument out of domain

or

# lvcreate -D y -s g -L 361200 -n lvol9 /dev/vg201
Warning: rounding up logical volume size to extent boundary at size "361216" MB.
lvcreate: The logical volume "/dev/vg201/lvol9" could not be created:
Argument out of domain

RESOLUTION
The formal definition of "Argument out of domain" is "You probably specified an argument a command does not support, or you specified a value to an argument that lies outside the acceptable range. Examine the syntax for the command, make adjustments, and try again." Both of the example commands above have valid arguments, which indicates the volume group configuration should be checked.

# vgdisplay /dev/vg01
Max LV 5
Cur LV 5

indicates that the maximum number of logical volumes has been hit.
max_lv is defined by the -l option when vgcreate was used to create the volume group. From the man page "-l max_lv Set the maximum number of logical volumes that the volume group is allowed to contain. The default value for max_lv is 255. The maximum number of logical volumes can be a value in the range 1 to 255."
Given this number cannot be changed on-the-fly at 11.x, the volume group will have to be recreated with a higher -l setting.

Tuesday, January 28, 2014

Significance of ignoreeof

ignoreeoff : Prventing [Ctrl-d] from logging out. Users often inadvertently press [Ctrl-d] with intent to terminate the standard input, but end up logging out of the system. The ignoreeof feature provides a safety lock to prevent this.

How to prevent your terminal from getting closed OR prevent from getting the user logged out on pressing Control-D?
The answer to this is IGNOREEOF / ignoreeof. Let us see what is this IGNOREEOF and the difference between IGNOREEOF and ignoreeof.

Control-D:
Many of the times we might have pressed some key only to realize that the terminal got closed or he got logged out from his account. The user gets logged out whenever an EOF character is pressed which is Control-D. So, if the user happens to be in his login shell, then logically the Ctrl-D ends up in getting the user's terminal closed as well.

Bash/Bourne shell:
In Bash/Bourne shell, the log-out from the user account can be prevented by using the environment variable IGNOREEOF.

export IGNOREEOF=2

The env variable IGNOREEOF is set to 2. This means, the shell will neglect 'Ctrl-D' key 2 times. A warning message will be displayed to the user on pressing Control-D for 2 times, however on the 3rd time, the user will be logged out. So, the user can set a value of his choice. In this way, the user can prevent himself from getting logged out of the shell. It is ideal to put this setting in the profile file to make it permanent.

export IGNOREEOF

[localhost 09:19 AM ~]# export IGNOREEOF=2
[localhost 09:19 AM ~]# export IGNOREEOF
[localhost 09:19 AM ~]# Use "logout" to leave the shell.
[localhost 09:19 AM ~]# Use "logout" to leave the shell.
[localhost 09:20 AM ~]# logout

By just declaring the environment variable without any value, the shell will neglect Ctrl-D 10 times since 10 is the default value.

To unset the variable IGNOREEOF:

unset IGNOREEOF

Ksh:
The environment variable IGNOREEOF does not work for k-shell. Instead, it is done using one of the k-shell set command options, "ignoreeof". This is the difference between IGNOREEOF and ignoreeof.
To set the 'ignoreeof' option in ksh:

set -o ignoreeof

This means ignoreeof is set. Once set, the shell will neglect Ctrl-D for 20 times in case of a ksh93, 11 times in case of older ksh shells which are the default values. A user defined value cannot be set in ksh unlike the IGNOREEOF in bash shell.

To unset the ignoreeof in ksh:

set +o ignoreeof

Significance of noclobber

noclobber: Protecting files from accidental overwriting. This feature is known as noclobber. Once set, you can protect your files from being overwritten with the shell's > and >> symbols.

This particular option is designed to keep you from accidentally destroying your existing files by redirecting input over an already-existing file.

set noclobber #No more overwriting files with >

If you now redirect command output to an existing file foo, the shell will retort with a message:

foo: File exists.

To override this protection feature, you have to use the ! after the >:

head -5 emp.lst >! foo

To find out the different settings:

set -o

To unset the "noclobber" option:

set +o noclobber

Sunday, January 5, 2014

Corrections for troubleshooting I/O fencing procedures

Corrections for troubleshooting I/O fencing procedures

How vxfen driver checks for pre-existing split-brain condition

Replace this topic in the Veritas Cluster Server User's Guide for 5.0 MP3 with the following content.

The vxfen driver functions to prevent an ejected node from rejoining the cluster after the failure of the private network links and before the private network links are repaired.

For example, suppose the cluster of system 1 and system 2 is functioning normally when the private network links are broken. Also suppose system 1 is the ejected system. When system 1 restarts before the private network links are restored, its membership configuration does not show system 2; however, when it attempts to register with the coordinator disks, it discovers system 2 is registered with them. Given this conflicting information about system 2, system 1 does not join the cluster and returns an error from vxfenconfig that resembles:

vxfenconfig: ERROR: There exists the potential for a preexisting

split-brain. The coordinator disks list no nodes which are in

the current membership. However, they also list nodes which are

not in the current membership.

I/O Fencing Disabled!

Also, the following information is displayed on the console:

<date> <system name> vxfen: WARNING: Potentially a preexisting

<date> <system name> split-brain.

<date> <system name> Dropping out of cluster.

<date> <system name> Refer to user documentation for steps

<date> <system name> required to clear preexisting split-brain.

<date> <system name> I/O Fencing DISABLED!

<date> <system name> gab: GAB:20032: Port b closed

However, the same error can occur when the private network links are working and both systems go down, system 1 restarts, and system 2 fails to come back up. From the view of the cluster from system 1, system 2 may still have the registrations on the coordinator disks.

<date> 07:29:25 VCS CRITICAL V-16-1-10029 VxFEN driver not configured. VCS Stopping. Manually restart VCS after configuring fencing

<date> 07:30:12 VCS NOTICE V-16-1-11022 VCS engine (had) started

<date> 07:30:12 VCS NOTICE V-16-1-11050 VCS engine version=4.1

<date> 07:30:12 VCS NOTICE V-16-1-11051 VCS engine join version=4.1000

<date> 07:30:12 VCS NOTICE V-16-1-11052 VCS engine pstamp=4.1 04/27/07-03:15:00

<date> 07:30:12 VCS NOTICE V-16-1-10114 Opening GAB library

<date> 07:30:12 VCS NOTICE V-16-1-10619 'HAD' starting on: system2

<date> 07:30:12 VCS WARNING V-16-1-11030 HAD not ready to receive this command. Message was: 0xc31

<date> 07:30:12 VCS INFO V-16-1-10125 GAB timeout set to 15000 ms

<date> 07:30:17 VCS INFO V-16-1-10077 Received new cluster membership

<date> 07:30:17 VCS NOTICE V-16-1-10080 System (system2) - Membership: 0x1, Jeopardy: 0x0

<date> 07:30:17 VCS NOTICE V-16-1-10086 System system2 (Node '0') is in Regular Membership - Membership: 0x1

<date> 07:30:17 VCS NOTICE V-16-1-10322 System system2 (Node '0') changed state from CURRENT_DISCOVER_WAIT to LOCAL_BUILD

<date> 07:30:18 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...

<date> 07:30:20 VCS CRITICAL V-16-1-10037 VxFEN driver not configured. Retrying...

<date> 07:30:20 VCS CRITICAL V-16-1-10029 VxFEN driver not configured. VCS Stopping. Manually restart VCS after configuring fencing

To resolve actual and apparent potential split brain conditions

Depending on the split-brain condition that you encountered, do the following:

Actual potential split brain condition - system 2 is up and system 1 is ejected

Determine if system2 is up or not. If system 1 is up and running, shut it down and repair the private network links to remove the split brain condition.

Restart system 1.

Apparent potential split brain condition - system 2 is down and system 1 is ejected physically verify that system 2 is down.

Verify the systems currently registered with the coordinator disks. Use the following command:

# vxfenadm -g all -f /etc/vxfentab

The output of this command identifies the keys registered with the coordinator disks.

Clear the keys on the coordinator disks as well as the data disks using the vxfenclearpre command.

# /opt/VRTSvcs/rac/bin/vxfenclearpre.

To check whether current host is an not using Fencing Information.

system2:root:/var/VRTSvcs/log # vxfenadm -d

I/O Fencing Cluster Information:

================================

VCS FEN vxfenadm ERROR V-11-2-1115 Local node is not a member of cluster!Checking whether there is an split brain condition existing or not.

system2:root:/var/VRTSvcs/log # /sbin/vxfenconfig -c

VCS FEN vxfenconfig NOTICE Driver will use SCSI-3 compliant disks.

VCS FEN vxfenconfig ERROR V-11-2-1016 There exists the potential for a preexisting split-brain

The coordinator disks list no nodes which,

are in the current membership. However, they,

also list nodes which are not in the,

current membership.

I/O Fencing Disabled!

Verify the systems currently registered with the coordinator disks. Use the following command:

system2:root:/var/VRTSvcs/log # vxfenadm -g all -f /etc/vxfentab

Device Name: /dev/sde

Total Number Of Keys: 2

key[0]: