|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
|
Multipathing is the ability of a server to communicate using multiple physical connections between the host bus adapters in the server and the other device, typically in Fibre Channel (FC ), iSCSI SAN environments and direct attached storage when multiple channels are available.
The most typical usage is for SUN. In this case "multipath" term simply means that a host can access a LU by multiple paths. Each connection from the server through the Host Bus Adapter (HBA) to the storage controller is referred as a path. Multipath connectivity is possible where multiple connection paths exist between a server and a storage unit (Logical Unit (LUN)) within a storage subsystem. This "dual path" configuration can be used to provide redundancy or increased bandwidth. Note that multipathing protects against the failure of path(s) and but not from the failure of a specific storage device. Another advantage of using multipath connectivity is the increased throughput by way of load balancing.
Prerequisites:
When the multipath driver detects I/O errors for an active path, it fails over the traffic to the device’s designated secondary path. When the preferred path becomes healthy again, control can be returned to the preferred path.
The most common use is for Fibre Channel (FC) Storage Area Network (SAN). See Compatibility list. The multipathing layer sits above the protocols (FCP or iSCSI), and determines whether or not the devices discovered on the target, represent separate devices or whether they are just separate paths to the same device. Device Mapper (DM) is the multipathing layer for Linux that performs this function.
Linux multipathing provides both IO failover and load sharing for multipathed block devices. The multipath IO support is based on two components
To determine which SCSI devices/paths correspond to the same LUN, the DM initiates a SCSI Inquiry. The inquiry response, among other things, carries the LUN serial number. Regardless of the number paths a LUN is associated with, the serial number for the LUN will always be the same. This is how multipathing SW determines which and how many paths are associated with each LUN.
While general concepts are pretty easy to understand, the details are not very well documented. There are few useful documents about details of multipath installation. Three most useful documents are (for more information see Recommended Links below):
This section describes how to manage failover and path load balancing for multiple paths between the servers and block storage devices.
- Section 5.1, Understanding Multipathing
- Section 5.2, Planning for Multipathing
- Section 5.3, Multipath Management Tools
- Section 5.4, Configuring the System for Multipathing
- Section 5.5, Enabling and Starting Multipath I/O Services
- Section 5.6, Configuring Path Failover Policies and Priorities
- Section 5.7, Tuning the Failover for Specific Host Bus Adapters
- Section 5.8, Configuring Multipath I/O for the Root Device
- Section 5.9, Configuring Multipath I/O for an Existing Software RAID
- Section 5.10, Scanning for New Devices without Rebooting
- Section 5.11, Scanning for New Partitioned Devices without Rebooting
- Section 5.12, Using Multipathed Devices
- Section 5.13, Viewing Multipath I/O Status
In case of SAN connected storage device usually two fiber interfaces from the host will be connected to the switch and several interfaces from the storage device will be connected to the same switch. In this case the storage controller can be accessed from either of the Host Bus Adapter (HBA) and hence we have multipath connectivity.
Device mapping multipath IO supports partitions (with limitations) and LVM2. Software RAID is also supported, but automatic discovery is not available and generally it does not make much sense.
Device mapping multipath IO is not available for the boot partition, because the boot loader cannot handle multipath IO. You need to set up a separate boot partition when using multipath IO.
It requires installation of two packages:
If LVM is used that the device-mapper package is installed as it's needed by Logical Volume Manager (LVM). If not it will we be instelled as a one of dependencies for multipath-tool package.
You can also install the multipath-tools package directly from the distribution DVD but it is better to get the most recent version. Names of RPM differ between SLES and RHEL.
On SLES:
rpm -ivh multipath-tools-0.4.7-75.7.i586.rpmOn Red Hat:
rpm -ivh device-mapper-multipath-0.4.5-31.el4.x86_64.rpm
To check if the package is installed you can use the command
rpm -qa | grep multipath
multipath-tools-0.4.7...
If you are using a storage subsystem that is automatically detected, no further configuration of the multipath-tools is required. Otherwise you need to create /etc/multipath.conf and add an appropriate device entry for your storage subsystem. See /usr/share/doc/packages/multipath-tools/multipath.conf.annotated for a template with extensive comments.
The system must be manually configured to automatically load the device drivers for the controllers to which the multipath IO devices are connected within the INITRD. Therefore add the needed driver module to the variable INITRD_MODULES in the file /etc/sysconfig/ kernel.
For Suse 10
INITRD_MODULES="piix megaraid_sas mptspi siimage processor thermal fan jbd ext3 dm_mod edd dm-multipath qla2xxx"
or on HP
INITRD_MODULES="ata_piix cciss processor thermal fan reiserfs dm_mod edd
dm-multipath qla2xxx "
/sbin/mkinitrd
chkconfig multipathd
chkconfig multipathd on
chkconfig boot.multipath on
chkconfig --list | grep multipath multipathd 0:off 1:off 2:off 3:on 4:off 5:on 6:off
cp /usr/share/doc/packages/multipath-tools/multipath.conf.synthetic /etc/multipath.conf
After having set up the configuration you can load multipath module manually
#load the multipath module
modprobe dm-multipath
Now you can perform a dry-run
with
multipath -v2 -d
which only scans the devices and prints what the setup would look like. The output is similar to the following:
3600601607cf30e00184589a37a31d911 [size=127 GB] [features="0"] [hwhandler="1 emc"] \_ round-robin 0 [first] \_ 1:0:1:2 sdav 66:240 [ready ] \_ 0:0:1:2 sdr 65:16 [ready ] \_ round-robin 0 \_ 1:0:0:2 sdag 66:0 [ready ] \_ 0:0:0:2 sdc 8:32 [ready ]
Paths are grouped into priority groups. There is only ever one priority group in active use. To model an active/active configuration, all paths end up in the same group. To model active/passive, the paths that should not be active in parallel are placed in several distinct priority groups. This normally happens completely automatically on device discovery.
The output shows the order, the scheduling policy used to balance IO within the group, and the paths for each priority group. For each path, its physical address ( host:bus:target:lun), device node name, major:minor number, and state is shown.
dm-multipath.disable=1
Querying the multipath IO status outputs the current status of the multipath maps. The command is multipath -l.
# multipath -l 360060160f390250024e60845be67df11 dm-2 DGC,RAID 5 [size=600G][features=1 queue_if_no_path][hwhandler=1 emc] \_ round-robin 0 [prio=-2][active] \_ 4:0:1:1 sdn 8:208 [active][undef] \_ 3:0:1:1 sdf 8:80 [active][undef] \_ round-robin 0 [prio=-2][enabled] \_ 4:0:0:1 sdj 8:144 [active][undef] \_ 3:0:0:1 sdb 8:16 [active][undef] 360060160f3902500acd08f4dbc67df11 dm-1 DGC,RAID 5 [size=1000G][features=1 queue_if_no_path][hwhandler=1 emc] \_ round-robin 0 [prio=-2][active] \_ 4:0:0:2 sdk 8:160 [active][undef] \_ 3:0:0:2 sdc 8:32 [active][undef] \_ round-robin 0 [prio=-2][enabled] \_ 4:0:1:2 sdo 8:224 [active][undef] \_ 3:0:1:2 sdg 8:96 [active][undef] ... ... ...
Host bus adapter time-outs are typically set up for non-multipath IO environments, because the only alternative would be to error out the IO and propagate the error to the application. However, with Multipath IO, some faults (like cable failures) should be propagated upwards as fast as possible so that the multipath IO layer can quickly take action and redirect the IO to another, healthy path.
To configure time-outs for your host bus adapter, add the appropriate options to /etc/modprobe.conf.local. For the QLogic 2xxx family of host bus adapters, for example, the following settings are recommended:
options qla2xxx qlport_down_retry=1 ql2xfailover=0 ql2xretrycount=5
Note: modifying
/etc/multipath.conf
is one of the easiest ways to make server inoperative
or crash on reboot. If server became unbootable or blue screened the way
to recover is to exclude dm-multipath module in grub command line:dm-multipath.disable=1 |
Names of the devices change when multipath is running as device mapper remap then to dm-0, dm-1, etc, for example. It's funny but those are considered to be "user_friendly_names". This is a default in SLES but generally is controlled by the following setting in /etc/multipath.conf.
defaults { user_friendly_names yes }
In older releases of Linux, multipath always tried to create a multipath device for every path that was not explicitly blacklisted. That made necessary creation of blacklist to exclude those devices.
In SLES 11 and RHEL 6, however, if the find_multipath
configuration
parameter is set to yes
, then multipath will create a device only if
one of three conditions are met:
multipath
command. This allows multipath automatically create correct multipath devices, without having to edit the multipath blacklist.
In certain scenarios where the driver, the host bus adapter, or the fabric experiences errors leading to loss of all paths, all IO should be queued instead of being propagated upwards.
To avoid this you need to put the the following setting in /etc/multipath.conf.
defaults { default_features "1 queue_if_no_path"
}
Because this leads to IO being queued forever unless a path is reinstated, make sure that multipathd is running and works for your scenario. Otherwise, IO might be stalled forever on the affected MPIO device until reboot or until you manually issue
dmsetup message <NAME> 0 fail_if_no_path
This immediately cause all queued IO to fail (replace <NAME> with the the correct map name). You can reactivate queueing by issuing the following command:
dmsetup message <NAME> 0 queue_if_no_path
You can also use these two commands to switch between modes for testing before committing the command to /etc/multipath.conf.
MPIO devices can be used directly, with LVM, and with mdadm.
If you want to use the entire LUNs directly
(for example, if you are using the SAN features to partition your storage),
you can simply use the /dev/disk/by-name/xxx names directly for mkfs, fstab,
your application, etc.
Volume managers such as LVM2 and EVMS run on top of multipathing. You must configure multipathing for a device before you use LVM2 or EVMS to create segment managers and file systems on it.
To make LVM2 recognize the MPIO devices as possible physical volumes, you must modify /etc/lvm/lvm.conf. It is important to modify it in a way that it does not scan and use the physical paths, but only accesses the multipath IO storage through the multipath IO layer. To do so, change the filter and types entry in /etc/lvm/lvm.conf as follows:
filter = [ "a|/dev/disk/by-name/.*|", "r|.*|" ] types = [ "device-mapper", 253 ]
This allows LVM2 to scan only the by-name paths and reject everything else.
If you are also using LVM2 on non-multipath IO devices, make the necessary adjustments
to suit your setup.
The same logic as for LVM2 applies to mdadm as well—the devices must be accessed by name rather than by physical path. Therefore the DEVICE entry in /etc/mdadm.conf must be modified:
DEVICE /dev/disk/by-name/*
Currently it is not possible to partition multipath IO devices themselves. If the underlying physical device is already partitioned, the multipath IO device reflects those partitions and the layer provides
/dev/disk/by-name/>name<p1 ... pN
devices so you can access the partitions through the multipath IO layer.
If not you can partition the underling device with fdisk.
As a consequence, the devices need to be partitioned prior to enabling multipath IO. If you change the partitioning in the running system, Multipath IO does not automatically detect and reflect these changes. It must be reinitialized, which usually requires a reboot.
The multipathing drivers and tools in SUSE support all seven of the supported processor architectures: IA32, AMD64/EM64T, IPF/IA64, p-Series (32-bit/64-bit), z-Series (31-bit and 64-bit). They also support most storage arrays. The storage array that houses the multipathed device must support multipathing too. Some storage array vendors provide their own multipathing management tools. Consult the vendor’s hardware documentation to determine what settings are required.
The multipath-tools user-space package takes care of automatic path discovery and grouping. It automatically tests the path periodically, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes the need for administrator attention in a production environment.
Tool |
Description |
---|---|
multipath |
this command scans the system for multipathed devices and assembles them. See |
multipathd |
Waits for maps events, then executes multipath. |
devmap-name |
Provides a meaningful device name to udev for device maps (devmaps). |
kpartx |
Maps linear devmaps to partitions on the multipathed device, which makes it possible to create multipath monitoring for partitions on the device. |
For a list of files included in this package, see the multipath-tools Package Description.
Ensure that the multipath-tools package is installed by entering the following at a terminal console prompt:
rpm -qa | grep multipath
If it is installed, the response repeats the package name and provides the version information, such as:
multipath-tools-04.7-34.23
In SUSE Linux Enterprise Server 10 and 11, Udev is the default device handler, and devices are automatically known to the system by the Worldwide ID instead of by the device node name. This resolves problems in previous releases where mdadm.conf and lvm.conf did not properly recognize multipathed devices.
For LVM2, mdadm requires that the devices be accessed by the ID rather than by the device node path. Therefore, the DEVICE entry in /etc/mdadm.conf should be set as follows:
DEVICE /dev/disk/by-id/*
This is the default setting for SUSE Linux Enterprise Server 10 and 11.
To verify that mdadm is installed:
Ensure that the mdadm package is installed by entering the following at a terminal console prompt:rpm -q mdadm
mdadm-2.6-0.11
package mdadm is not installed
For additional information about modifying the /etc/lvm/lvm.conf file, see Using LVM2.
For additional information about multipath command see multipath command and News items below.
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
The boot scripts will only detect MPIO devices if the modules for the respective controllers are loaded at boot time. To achieve this, simply add the needed driver module to the variable INITRD_MODULES within the file /etc/sysconfig/kernel.
Example:
Your system contains a RAID controller that is accessed by the cciss driver and you are using ReiserFS as a filesystem. The MPIO devices will be connected to a Qlogic controller accessed by the driver qla2xxx, which is not yet configured to be used on this system. The mentioned entry within /etc/sysconfig/kernel will then probably look like this:
INITRD_MODULES="cciss reiserfs"
Using an editor, you would now change this entry:
INITRD_MODULES="cciss reiserfs qla2xxx"
When you have applied this change, you will need to recreate the INITRD on your system to reflect it. Simply run this command:
mkinitrd
When you are using GRUB as a bootmanager, you do not use to make any further changes. Upon the next reboot the needed driver will be loaded within the INITRD. If you are using LILO as bootmanager, please remember to run it once to update the boot record.
Configuring multipath-tools If your system is one of those listed above, no further configuration should be required.
You might otherwise have to create /etc/multipath.conf (see the examples under /usr/share/doc/packages/multipath-tools/) and add an appropriate devices entry for your storage subsystem.
One particularly interesting option in the /etc/multipath-tools.conf file is the "polling_interval" which defines the frequency of the path checking that can be configured.
Alternatively, you might choose to blacklist certain devices which you do not want multipath-tools to scan.
You can then run:
multipath -v2 -d
to perform a 'dry-run' with this configuration. This will only scan the devices and print what the setup would look like.
The output will look similar to:
3600601607cf30e00184589a37a31d911 [size=127 GB][features="0"][hwhandler="1 emc"] \_ round-robin 0 [first] \_ 1:0:1:2 sdav 66:240 [ready ] \_ 0:0:1:2 sdr 65:16 [ready ] \_ round-robin 0 \_ 1:0:0:2 sdag 66:0 [ready ] \_ 0:0:0:2 sdc 8:32 [ready ]showing you the name of the MPIO device, its size, the features and hardware handlers involved, as well as the (in this case, two) priority groups (PG). For each PG, it shows whether it is the first (highest priority) one, the scheduling policy used to balance IO within the group, and the paths contained within the PG. For each path, its physical address (host:bus:target:lun), device nodename and major:minor number is shown, and of course whether the path is currently active or not.
Paths are grouped into priority groups; there's always just one priority group in active use. To model an active/active configuration, all paths end up in the same group; to model active/passive, the paths which should not be active in parallel will be placed in several distinct priority groups. This normally happens completely automatically on device discovery.
Enabling the MPIO components Now run
/etc/init.d/boot.multipath start /etc/init.d/multipathd startas user root. The multipath devices shcdould now show up automatically under /dev/disk/by-name/; the default naming will be the WWN of the Logical Unit, which you can override via /etc/multipath.conf to suit your tastes.
Run
insserv boot.multipath multipathdto integrate the multipath setup into the boot sequence.
From now on all access to the devices should go through the MPIO layer.
Querying MPIO status To query the current MPIO status, run
multipath -lThis will output the current status of the multipath maps in a format similar to the command already explained above:
3600601607cf30e00184589a37a31d911 [size=127 GB][features="0"][hwhandler="1 emc"] \_ round-robin 0 [active][first] \_ 1:0:1:2 sdav 66:240 [ready ][active] \_ 0:0:1:2 sdr 65:16 [ready ][active] \_ round-robin 0 [enabled] \_ 1:0:0:2 sdag 66:0 [ready ][active] \_ 0:0:0:2 sdc 8:32 [ready ][active]However, it includes additional information about which priority group is active, disabled or enabled, as well as for each path whether it is currently active or not.
Tuning the fail-over with specific HBAs HBA timeouts are typically setup for non-MPIO environments, where longer timeouts make sense - as the only alternative would be to error out the IO and propagate the error to the application. However, with MPIO, some faults (like cable failures) should be propagated upwards as fast as possible so that the MPIO layer can quickly take action and redirect the IO to another, healthy path.
For the QLogic 2xxx family of HBAs, the following setting in /etc/modprobe.conf.local is thus recommended:
options qla2xxx qlport_down_retry=1 ql2xfailover=0 ql2xretrycount=5Managing IO in error situations In certain scenarios, where the driver, the HBA or the fabric experiences spurious errors,it is advisable that DM MPIO is configured to queue all IO in case of errors leading loss of all paths, and never propagate errors upwards.
This can be achieved by setting
defaults { default_features "1 queue_if_no_path" }in /etc/multipath.conf.
As this will lead to IO being queued forever, unless a path is reinstated, make sure that multipathd is running and works for your scenario. Otherwise, IO might be stalled forever on the affected MPIO device, until reboot or until you manually issue a
dmsetup message 3600601607cf30e00184589a37a31d911 0 fail_if_no_path(substituting the correct map name), which will immediately cause all queued IO to fail. You can reactivate the queue if no path feature by issueing
dmsetup message 3600601607cf30e00184589a37a31d911 0 queue_if_no_pathYou can also use these two commands to switch between both modes for testing, before committing the command to your /etc/multipath.conf.
4. Using the MPIO devices
- Using the whole MPIO devices directly
If you want to use the whole LUs directly (if for example you're using the SAN features to partition your storage), you can simply use the /dev/disk/by-name/xxx names directly for mkfs, /etc/fstab, your application, etc.
- Using LVM2 on top of the MPIO devices
To make LVM2 recognize the MPIO devices as possible Physical Volumes (PVs), you will have to modify /etc/lvm/lvm.conf. You will also want to modify it so that it does not scan and use the physical paths, but only accesses your MPIO storage via the MPIO layer.
Thus, change the "filter" entry in lvm.conf as follows and add the types extension to make LVM2 recognize them:
filter = [ "a|/dev/disk/by-name/.*|", "r|.*|" ] types = [ "device-mapper", 1 ]This will allow LVM2 to only scan the by-name paths and reject everything else. (If you are also using LVM2 on non-MPIO devices, you will of course need to make the necessary adjustments to suit your setup.)
You can then use pvcreate and the other LVM2 commands as usual on the /dev/disk/by-name/ path.
- Partitions on top of MPIO devices
It is not currently possible to partition the MPIO devices themselves. However, if the underlying physical device is partitioned, the MPIO device will reflect those partitions and the MPIO layer will provide /dev/disk/by-name/>name<p1 ... pN devices so you can access the partitions through the MPIO layer.
So you will have to partition the devices prior to enabling MPIO; if you change the partitioning in the running system, MPIO will not automatically detect this and reflect the changes; you will have to reinit MPIO, which in a running system, with active access to the devices, will likely imply a reboot.
Thus, using the LUNs directly or via LVM2 is recommended.
LazySystemAdmin
Procedure for configuring the system with DM-Multipath:
- Install device-mapper-multipath rpm
- Edit the multipath.conf configuration file:
- comment out the default blacklist
- change any of the existing defaults as needed
- Start the multipath daemons
- Create the multipath device with the multipath
Install Device Mapper Multipath
# rpm -ivh device-mapper-multipath-0.4.7-8.el5.i386.rpm warning: device-mapper-multipath-0.4.7-8.el5.i386.rpm: Header V3 DSA signature: Preparing... ########################################### [100%] 1:device-mapper-multipath########################################### [100%]Initial Configuration
Set user_friendly_name. The devices will be created as /dev/mapper/mpath[n]. Uncomment the blacklist.# vim /etc/multipath.conf #blacklist { # devnode "*" #} defaults { user_friendly_names yes path_grouping_policy multibus }Load the needed modul and the startup service.# modprobe dm-multipath # /etc/init.d/multipathd start # chkconfig multipathd onPrint out the multipathed device.# multipath -v2 or # multipath -v3
01-11-2008
You want to combine several physical network cards into a virtual one. This article does not describe all of the possible options and features of bonding but simply explains how to set it up on Novell Linux products. For additional information on bonding itself, please refer to the file /usr/src/linux/Documentation/networking/bonding.txt , provided the kernel sources are installed on your system, or visit the home page of the bonding project.
Novell User CommunitiesAdd a new disk
This assumes you have created an array on the SAN and allocated space to a logical volume on it; you have mapped a LUN pointing that logical volume to that host, and that the host is correctly zoned to see the SAN in the fibre channel fabric.
Remove a disk
- Before anything, run multipath -ll to see what is there currently.
- See how many HBAs are connected (and zoned) to the SAN - you need to repeat the commands for each one. For example:
echo 1 > /sys/class/fc_host/host0/issue_lip echo 1 > /sys/class/fc_host/host1/issue_lip echo "- - -" > /sys/class/scsi_host/host0/scan echo "- - -" > /sys/class/scsi_host/host0/scan- After running those commands, check that something happened by using dmesg and /var/log/messages.
- Run multipath -v2 to get multipath to pick it up - you can then compare the listing to the previously run command.
Note the scsi devices for the new disk, it will be sdX and sdY or whatever.
- Edit /etc/lvm/lvm.conf and make sure these are being filtered to remove duplicates - use vgdisplay -vv to show what LVM considers duplicate.
FYI: device mapper / multipath create multiple device handles for the same device, this can cause delays with LVM2 and severely impact throughput.
- Now you can pvcreate /dev/dm-XX, vgextend VolGroup dev/dm-XX, etc.
- Run the multipath -ll command, note the UUID (the big hex number), LUN and sdX device of the disk, eg in the example below it is LUN 2:, and they are /dev/sdf and /dev/sde - you will need this info for the procedure. Just confirm this is in fact the one you want to remove - cross-check the LUN and size of the volume on the SAN before proceeding...
3600a0b80000fb6e50000000e487b02f5 dm-10 IBM,1742 [size=1.6T][features=1 queue_if_no_path][hwhandler=1 rdac] \_ round-robin 0 [prio=6][active] \_ 1:0:0:2 sdf 8:80 [active][ready] \_ round-robin 0 [prio=1][enabled] \_ 0:0:0:2 sde 8:64 [active][ghost]Note - the dm-XX is not permanent and may change when you've added or removed disks, so don't rely on old info - check each time.- Also cat /proc/scsi/scsi to see match up the kernel SCSI devices with the SCSI IDs and LUNs of the SAN disks.
- First you need to remove the disk from the volume group.
- If the disk is in use, either delete what is on it (if there is a logical volume limited to that disk), or use pvmove. (this of course assumes you have sufficient space to move everything off the original disk)
NB - with pvmove on SUSE10 sp2, there is a bug where if you are moving from a bigger disk to smaller disk(s), it may complain there isn't enough space. Just move as many extents as are on the first smaller disk, the you can move the rest on to the second, eg: pvmove /dev/dm-1:0-20000 /dev/dm-2.
- Once stuff is deleted/removed: vgreduce VolGroup dev/dm-XX and pvremove /dev/dm-XX.
- Using the disk ID for the next command (the dm-xx isn't recognised), from multipath -ll:
dmsetup remove 3600a0b80000f7b270000000b47b15c26. (of course you need to use your own disk ID)- Now, finally, you remove the SCSI devices from the kernel:
echo 1 > /sys/block/sdX/device/delete echo 1 > /sys/block/sdY/device/deleteYou should now have all traces removed, you can run multipath -ll and cat /proc/scsi/scsi to cross check. You can now remove the mapping from the SAN and delete the logical volume if required.
DANiEL:Sorry its taken a second to get back to you. Our highload issues may be associated with other problems. FYI, you should read this link:
https://lists.linux-foundation.org/pipermail/bugme-new/2007-June/016354.html
I'm currently in communication with Sun/Oracle as well as LSI (the actual manufacturer of the ST2540) and their official position is that DM-MP is not supported. I believe it to be due to how the ST2540 is asymmetrical. Here is another link that talks about it somewhat. The ST6140 is more or less the same as the ST2540:
http://southbrain.com/south/2009/10/qla-linux-and-rdac---sun-6140.html
I'd be curious to know what your take on all this is. We started observing controller resets for no known reason back in February and have been diagnosing since. I'm of the opinion that it is due to AVT.
I'm currently testing a multipath configuration where the second controller is disabled; multiple paths, but no failover. Failover is not essential in our environment, but would be nice to have.
We are unfortunately unable to use mppRDAC, as far as we can tell, as it does not actually support our HBAs. Sun is somewhat confused as to what is and is not supported hardware: their documents conflict.
Officially, our HBAs (QLE2560s) and switches (QLogic 5802s) are not compatible with the ST2540. Needless to say, we are very happy with our reseller...
We use in servers also these cards:
QLogic Fibre Channel HBA Driver: 8.03.00.1.05.05-k
QLogic QLE2560 - Sun StorageTek 8Gb FC PCIe HBA, single port
ISP2532: PCIe (2.5Gb/s x8) @ 0000:03:00.0 hdma+, host#=8, fw=4.04.09 (85)
Yes, we talked recently to LSI guys, so I know their possition about not supporting dm-multipath. I was really surprised.I have few things for you if you wanna further investigate: are u using latest fcode (firmware) for your HBAs? Can you paste me in pastebin.org all configs I was mentioning in original article (multipath.conf, lvm.conf, ...) + "uname -a" + maybe logs? We are not observing controller resets.. On one cluster we use about 10 LUNs from disk array (which are all owned by one controller) and on second cluster also about 10 LUNs (which are owned by second controller).
What exactly servers are you using? For example in X4270 servers you should put HBAs only in number 0 and 3 PCIe slots!
We're using the same HBAs, I don't have the fw on hand, but it should be the recent as of this fall. We upgraded fw on everything for all of our systems. SANs are latest as well.
We've got a mix of x4140s and x4450s. I don't remember what PCIe slot the HBA is in, however, those were installed by our reseller, not by us.
We are not using LVM, not needed for our environment.
Your use is considerably more than ours. But, regardless, I can create controller resets on a single 250g volume (FS agnostic, used both ext3 and OCFS2), with nothing else using the SAN, no other volumes, and only a single host with an initiator. The behavior only manifests itself when both controllers are fibre attached. If we unplug one controller (remove failover possibility), the configuration is stable. We know for a fact that the controller is resetting: I've become very familiar with the SAN service adviser dumps.
I pasted multipath.conf, multipath -v2 -ll, uname -a, modprobe.conf and logs from one of our machines that was observing the behavior. We are using a rebuilt initrd that preloads scsi_dh_rdac.
http://pastebin.org/149331
Do both of your hosts see both controllers? ie, multipath -v2 -ll shows 2 active paths and 2 ghost paths per LUN?
I appreciate your help on this. Sounds like we have a lot of similarities in our setup, I'm jealous of yours though ;)
- Software installation
Upgrade a system to SLES9 SP2 level (or more recent) and install the multipath-tools package.
- Changing system configuration
Using an editor of your choice, within /etc/sysconfig/hotplug set this value:
HOTPLUG_USE_SUBFS=noIn addition to the above change, please configure the system to automatically load the device drivers for the controllers the MPIO devices are connected to within the INITRD. The boot scripts will only detect MPIO devices if the modules for the respective controllers are loaded at boot time. To achieve this, simply add the needed driver module to the variable INITRD_MODULES within the file /etc/sysconfig/kernel.
Example:
Your system contains a RAID controller that is accessed by the cciss driver and you are using ReiserFS as a filesystem. The MPIO devices will be connected to a Qlogic controller accessed by the driver qla2xxx, which is not yet configured to be used on this system. The mentioned entry within /etc/sysconfig/kernel will then probably look like this:
INITRD_MODULES="cciss reiserfs"Using an editor, you would now change this entry:
INITRD_MODULES="cciss reiserfs qla2xxx"When you have applied this change, you will need to recreate the INITRD on your system to reflect it. Simply run this command:
mkinitrdWhen you are using GRUB as a bootmanager, you do not use to make any further changes. Upon the next reboot the needed driver will be loaded within the INITRD. If you are using LILO as bootmanager, please remember to run it once to update the boot record.
- Configuring multipath-tools
If your system is one of those listed above, no further configuration should be required.
You might otherwise have to create /etc/multipath.conf (see the examples under /usr/share/doc/packages/multipath-tools/) and add an appropriate devices entry for your storage subsystem.
One particularly interesting option in the /etc/multipath-tools.conf file is the "polling_interval" which defines the frequency of the path checking that can be configured.
Alternatively, you might choose to blacklist certain devices which you do not want multipath-tools to scan.
You can then run:
multipath -v2 -dto perform a 'dry-run' with this configuration. This will only scan the devices and print what the setup would look like.
The output will look similar to:
3600601607cf30e00184589a37a31d911 [size=127 GB][features="0"][hwhandler="1 emc"] \_ round-robin 0 [first] \_ 1:0:1:2 sdav 66:240 [ready ] \_ 0:0:1:2 sdr 65:16 [ready ] \_ round-robin 0 \_ 1:0:0:2 sdag 66:0 [ready ] \_ 0:0:0:2 sdc 8:32 [ready ]showing you the name of the MPIO device, its size, the features and hardware handlers involved, as well as the (in this case, two) priority groups (PG). For each PG, it shows whether it is the first (highest priority) one, the scheduling policy used to balance IO within the group, and the paths contained within the PG. For each path, its physical address (host:bus:target:lun), device nodename and major:minor number is shown, and of course whether the path is currently active or not.
Paths are grouped into priority groups; there's always just one priority group in active use. To model an active/active configuration, all paths end up in the same group; to model active/passive, the paths which should not be active in parallel will be placed in several distinct priority groups. This normally happens completely automatically on device discovery.
- Enabling the MPIO components
Now run
/etc/init.d/boot.multipath start /etc/init.d/multipathd startas user root. The multipath devices should now show up automatically under /dev/disk/by-name/; the default naming will be the WWN of the Logical Unit, which you can override via /etc/multipath.conf to suit your tastes.
Run
insserv boot.multipath multipathdto integrate the multipath setup into the boot sequence.
From now on all access to the devices should go through the MPIO layer.
- Querying MPIO status
To query the current MPIO status, run
multipath -lThis will output the current status of the multipath maps in a format similar to the command already explained above:
3600601607cf30e00184589a37a31d911 [size=127 GB][features="0"][hwhandler="1 emc"] \_ round-robin 0 [active][first] \_ 1:0:1:2 sdav 66:240 [ready ][active] \_ 0:0:1:2 sdr 65:16 [ready ][active] \_ round-robin 0 [enabled] \_ 1:0:0:2 sdag 66:0 [ready ][active] \_ 0:0:0:2 sdc 8:32 [ready ][active]However, it includes additional information about which priority group is active, disabled or enabled, as well as for each path whether it is currently active or not.
- Tuning the fail-over with specific HBAs
HBA timeouts are typically setup for non-MPIO environments, where longer timeouts make sense - as the only alternative would be to error out the IO and propagate the error to the application. However, with MPIO, some faults (like cable failures) should be propagated upwards as fast as possible so that the MPIO layer can quickly take action and redirect the IO to another, healthy path.
For the QLogic 2xxx family of HBAs, the following setting in /etc/modprobe.conf.local is thus recommended:
options qla2xxx qlport_down_retry=1 ql2xfailover=0 ql2xretrycount=5- Managing IO in error situations
In certain scenarios, where the driver, the HBA or the fabric experiences spurious errors,it is advisable that DM MPIO is configured to queue all IO in case of errors leading loss of all paths, and never propagate errors upwards.
This can be achieved by setting
defaults { default_features "1 queue_if_no_path" } df -in /etc/multipath.conf.
As this will lead to IO being queued forever, unless a path is reinstated, make sure that multipathd is running and works for your scenario. Otherwise, IO might be stalled forever on the affected MPIO device, until reboot or until you manually issue a
dmsetup message 3600601607cf30e00184589a37a31d911 0 fail_if_no_path(substituting the correct map name), which will immediately cause all queued IO to fail. You can reactivate the queue if no path feature by issueing
dmsetup message 3600601607cf30e00184589a37a31d911 0 queue_if_no_pathYou can also use these two commands to switch between both/ modes for testing, before committing the command to your /etc/multipath.conf.
4. Using the MPIO devices
- Using the whole MPIO devices directly
If you want to use the whole LUs directly (if for example you're using the SAN features to partition your storage), you can simply use the /dev/disk/by-name/xxx names directly for mkfs, /etc/fstab, your application, etc.
- Using LVM2 on top of the MPIO devices
To make LVM2 recognize the MPIO devices as possible Physical Volumes (PVs), you will have to modify /etc/lvm/lvm.conf. You will also want to modify it so that it does not scan and use the physical paths, but only accesses your MPIO storage via the MPIO layer.
Thus, change the "filter" entry in lvm.conf as follows and add the types extension to make LVM2 recognize them:
filter = [ "a|/dev/disk/by-name/.*|", "r|.*|" ] types = [ "device-mapper", 1 ]This will allow LVM2 to only scan the by-name paths and reject everything else. (If you are also using LVM2 on non-MPIO devices, you will of course need to make the necessary adjustments to suit your setup.)
You can then use pvcreate and the other LVM2 commands as usual on the /dev/disk/by-name/ path.
- Partitions on top of MPIO devices
It is not currently possible to partition the MPIO devices themselves. However, if the underlying physical device is partitioned, the MPIO device will reflect those partitions and the MPIO layer will provide /dev/disk/by-name/>name<p1 ... pN devices so you can access the partitions through the MPIO layer.
So you will have to partition the devices prior to enabling MPIO; if you change the partitioning in the running system, MPIO will not automatically detect this and reflect the changes; you will have to reinit MPIO, which in a running system, with active access to the devices, will likely imply a reboot.
Thus, using the LUNs directly or via LVM2 is recommended.
Interim resolution
An interim MPIO setup procedure can be derived from the Novell SUSE Linux Enterprise Server (SLES) 10 Storage Administration Guide as follows:
- Locate the Storage Administration Guide on the SLES 10 SP1 distribution CD at: \usr\share\doc\manual\sles-stor_evms_en\SLES-stor_evms_en.pdf
- Begin at Section 5.5, "Adding multipathd to the Boot Sequence" of the Storage Administration Guide, and configure MPIO as specified unless directed otherwise by this issue document.
- Before proceeding into Section 5.5.1 "YaST" the multipath.conf file must be set up. On this matter, Novell documentation states, in contradiction to IBM documentation, that "If you are using a storage subsystem that is automatically detected ... no further configuration of the /etc/multipath.conf file is required." (See Section 5.7, "Adding Support for the Storage Subsystem to /etc/multipath.conf", and Section 5.3, which identifies DS6000 and DS8000 as supported hardware that requires no further configuration.) IBM provides information that supersedes this statement on its "Subsystem Device Driver for Linux" Web page found at: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S4000107&loc=en_US&cs=utf-8&lang=en
Table "SLES 10 Device Mapper Multipath Configuration File for DS8000 / DS6000 / ESS / SVC 4.2" specifies a "Configuration File Version 1.03" that is to be downloaded to the /etc directory. This version of multipath.conf is specially configured for the hardware noted, and the additional configuring specified in Section 5.7 is to be ignored.
- If you boot over SAN, be aware that Boot Over SAN does not work with User Friendly Names. Before executing the instructions of Section 5.10, "Configuring Multipath I/O for the Root Device", you must disable the User Friendly Names feature. Open the IBM-provided /etc/multipath.conf file that you downloaded in Step 4 of this issue document, and locate the following:
user_friendly_names yesModify the entry as follows:user_friendly_names no- Continue using the SLES 10 Storage Administration Guide until configuration of MPIO on SAN shared storage devices is fully configured.
- Upon completion of MPIO configuration, verify that multipath is enabled with the command:
multipath -lIf multipath is enabled, a list of active disk statistics will display.
Setup procedure
- Install the multipath-tools package.
- Edit /etc/init.d/multipathd: add iscsi to Required-Start line. [SUSE]
- Activate on boot [SUSE]:
chkconfig boot.multipath on chkconfig multipathd on
- Activate now:
service multipathd start
- find out WWIDs:
multipath -v2 -d
- Set up aliases in /etc/multipath.conf:
multipaths { multipath { wwid averybiglonghexstringthatsimpossibletoread alias something_readable } multipath { wwid anotherverybiglonghexstringequallyhardtoread alias something_else_readable } }
- activate multipaths:
multipath -v2
- verify:
multipath -l
- verify (you should see sensible device names in the output):
ls -la /dev/mapper
Steps
- up2date device-mapper-multipath
- Edit /etc/multipath.conf
For detailed information, see: "SAN Persistent Binding and Multipathing in the 2.6 Kernel"- modprobe dm-multipath
- modprobe dm-round-robin
- service multipathd start
- multipath -v2
Will show multipath luns and groups. Look for the multipath group number, this is the dm-# listed in /proc/partitions. The multipath lun is accessed via the /dev/dm-# device entry.- Format each SCSI DEVICE:
- sfdisk /dev/sdX
- (Optional) Create multipath devices for each partition:
(not needed if using LVM, since you will just mount the logical volume device)
- kpartx -a /dev/dm-#
- Enable multipath to start on bootup:
- chkconfig multipathd on
Other useful cmds
- multipath -F
- Clear all multipath bindings. Useful when making changes to /etc/multipath.conf (clear multipath map, then run multipath to read the config file and build a new map).
- multipath -v3 -ll
- List lots of information about all known disks, what multipath groups they belong to, settings, etc...
(NOTE: the -ll also seems to force multipathd to pick up (rescan for) new devices that might have been added to the system but not recognized by mutlipathd yet.)
- dmsetup ls --target=multipath
- Determine multipath device assignments on a system.
Links & References
- Device Mapper Resource Page
- List of sources and related packages, etc...
- How do I setup device-mapper multipathing in Red Hat Enterprise Linux 4?
- Redhat Knowledge Base, article id: 7170
Basically the same as this page, but more generic.- How do I make device mapper multipath ignore my local dis
- Redhat Knowledge Base, article id: 7319
- How can I add more products into the multipathing database?
- Redhat Knowledge Base, article id: 8131
Date: Tue, 23 May 2006 14:57:52 +0200 Message-ID: <CA1361E9F77A4243A99E04D98F5CC724023E8E9C@ZARDPEXCH001.medscheme.com> From: "Stephen Hughes" <[email protected]> Subject: RE: [suse-sles-e] Adding Disk on the flyHi Group,
Thanks for all your help with this matter. I managed to use the
rescan-scsi-bus.sh command on one of my servers to add a SAN attached
disk, but now I've assigned more disk to another server I have running
SLES9. I run the rescan-scsi-bus.sh command with the various switches
but it still does not pick up my new disk. Below id the output from my
lsscsi command as well as the command I ran to try and pick up the disk.
The new disk according to my Navisphere client should come in after
/dev/sdbp.
I also looked at some of the other replies but I don't have the rescan
file to echo a "1" to as suggested in one of the replies:
"# echo 1 > /sys/bus/scsi/devices/0:0:0:0/rescan"
mamba:/usr/local/bin # lsscsi
[0:0:6:0] process DELL 1x4 U2W SCSI BP 1.27 -
[0:2:0:0] disk MegaRAID LD0 RAID0 69356R 161J /dev/sda
[1:0:0:0] disk EMC SYMMETRIX 5671 /dev/sdb
[1:0:0:1] disk EMC SYMMETRIX 5671 /dev/sdc
[1:0:0:2] disk EMC SYMMETRIX 5671 /dev/sdd
[1:0:0:3] disk EMC SYMMETRIX 5671 /dev/sde
[1:0:0:4] disk EMC SYMMETRIX 5671 /dev/sdf
[1:0:0:5] disk EMC SYMMETRIX 5671 /dev/sdg
[1:0:0:6] disk EMC SYMMETRIX 5671 /dev/sdh
[1:0:0:7] disk EMC SYMMETRIX 5671 /dev/sdi
... ... ...
Command I executed: (26,27,28,29 because I'm adding 4 LUNS)
mamba:/usr/local/bin # rescan-scsi-bus.sh --hosts=3 --ids=5
--luns=26,27,28,29
Thanks
Stephen
-----Original Message-----
From: Matt Gillard [mailto:[email protected]]
Sent: 04 May 2006 08:06 AM
To: Stephen Hughes; Denis Brown; [email protected]
Subject: RE: [suse-sles-e] Adding Disk on the fly/bin/rescan-scsi-bus.sh is what you are after.
SUSE Linux Enterprise in the Americas
Customers are always looking for ways to get their cost of Linux deployments down lower, and make management easier on their staff. One of, at least in my opinion, the best options they have is to get rid of 3rd party multi path IO solutions for your SAN and disk management.
I was at one of my customers the other day helping them set up MPIO that is built into SLES 10. While I was there I took a few notes for what we did to get things working for their environment. These same instructions should work with other SAN's that can handle multi path IO.
SLES 10 supports a lot of SAN's right out of the box and automatically detects them so you don't really need an /etc/multipath.conf. My customer likes to be able to change the black list for various types of hardware they use and wanted user-friendly names. To do this I created a multipath.conf for them that looked like the following…
## /etc/multipath.conf file for SLES 10
## You may find a full copy of this file, with comments, here..
## /usr/share/doc/packages/multipath-tools/multipath.conf# Setup user friendly names
# name : user_friendly_names
# scope : multipath
# desc : If set to "yes", using the bindings file
# /var/lib/multipath/bindings to assign a persistent and
# unique alias to the multipath, in the form of mpath<n>.
# If set to "no" use the WWID as the alias. In either case
# this be will be overriden by any specific aliases in this
# file.
# values : yes|no
# default : nodefaults {
user_friendly_names yes}
# Setup the blacklisted devices….
# name : blacklist
# scope : multipath & multipathd
# desc : list of device names that are not multipath candidates
# default : cciss, fd, hd, md, dm, sr, scd, st, ram, raw, loop
#blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z][[0-9]*]"
devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"}
If your curious about what platforms that SLES 10 supports out of the box a list is in the SLES documentation.
Assuming you already have your LUNs assigned to you. And once you have the /etc/multipath.conf file there are a few services that need to be started to make all this work.
# service boot.multipth start
# service multipathd startThat should start the demons and load kernel modules that you need. To check that do an lsmod to see if you see dm_multipath and multipath. Once that is done you can check your setup to see if it is correct…
# multipath –v2 -d
create: mpath10 (360080480000290100601544032363831) EMC,SYMMETRIX
[size=200G][features=0][hwhandler=0]
\_ round-robin 0 [prio=4][undef]
\_ 11:0:0:39 sdbr 68:80 [undef][ready]
\_ 11:0:1:39 sdcc 69:0 [undef][ready]
\_ 10:0:0:39 sdl 8:176 [undef][ready]
\_ 10:0:1:39 sdw 65:96 [undef][ready]
create: mpath11 (360080480000290100601544032363832) EMC,SYMMETRIX
[size=400G][features=0][hwhandler=0]
\_ round-robin 0 [prio=4][undef]
\_ 11:0:0:40 sdbs 68:96 [undef][ready]
\_ 11:0:1:40 sdcd 69:16 [undef][ready]
\_ 10:0:0:40 sdm 8:192 [undef][ready]
\_ 10:0:1:40 sdx 65:112 [undef][ready]
That is what it looks like for the EMC Symmetrix I was working with so your mileage may vary.
Once you have the devices showing up correctly you need to make sure the multi path modules load on reboot. To do that run the following commands…
# chkconfig multipathd on
# chkconfig boot.multipath onThe next thing is to configure LVM to scan these devices so you can use them in your volume groups. To do this you will need to edit /etc/lvm/lvm.conf in the following places…
filter = [ "a|/dev/disk/by-id/.*|", "r|.*|" ]types = [ "device-mapper", 253 ]Above limits the devices that LVM will scan to only devices that show up by-id. If your using LVM to manage other disks that are not in that directory, think local scsi drives, you will need to make sure those are still available by adjusting your filter more like this…
filter = [ "a|/dev/disk/by-id/.*|", "a|/dev/sda1$/", "r|.*|" ]Once that is done do a lvmdiskscan to get LVM to see the new drives.
A few other things that customers often ask for is how to have SLES scan for new LUNs on the san without rebooting. With SLES10 it's s simple as passing a few parameters to the sys file system.
# echo 1 > /sys/class/fc_host/host<number>/issue_lip
That will make the kernel aware of the new devices at a very low level, but the devices are not yet usable. To make them usable do the following…
# echo "- - -" > /sys/class/scsi_host/host<number>/scan
That will scan all devices and add the new ones for you. All of this information is in the SLES 10 Storage Administration Guide, including various ways to recover from issues.
Also since SP1 SLES10 has been able to boot a mpio device from the SAN. The doc for doing that in SP1 is located here.
Have fun and enjoy..
- Fun with your SAN and Multi-path - May 6th, 2008
Popularity: 27% [?]
" SAP Business All-In-One to ship preconfigured on SLES and HP | Managing your iPod with RhythmBox & Linux "
2 Responses to " Fun with your SAN and Multi-path "
- kkhenson says:
July 17th, 2008 at 11:52 amI would just add, for IBM DS8000, DS6000, ESS, or SVC 4.2 disk systems, you need to use the multipath.conf file located in a table here: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D430&uid=ssg1S4000107&loc=en_US&cs=utf-8&lang=en
- bmcleod says:
July 24th, 2008 at 10:58 amDaniel - Nice article, helped me with MPIO. I'm trying MPIO with EMC CX-300 / SLES 10 SP2 and Xen. Thanks -Bruce
Think you have one typo:
# service boot.multipth start
should be
# service boot.multipath start
Preamble: The procedure described within this article is only supported on SLES9 SP2 level and higher. Earlier releases may not work as expected.
The Multipath IO (MPIO) support in SLES9 (SP2) is based on the Device Mapper (DM) multipath module of the Linux kernel, and the multipath-tools user-space package. These have been enhanced and integrated into SLES9 SP2 by SUSE Development.
DM MPIO is the preferred form of MPIO on SLES9 and the only option completely supported by Novell/SUSE.
DM MPIO features automatic configuration of the MPIO subsystem for a large variety of setups. Active/passive or active/active (with round-robin load balancing) configurations of up to 8 paths to each device are supported.
The framework is extensible both via specific hardware handlers (see below) or via more sophisticated load balancing algorithms than round-robin.
The user-space component takes care of automatic path discovery and grouping, as well as automated path retesting, so that a previously failed path is automatically reinstated when it becomes healthy again. This minimizes, if not obviates, the need for administrator attention in a production environment.
2. Supported configurations
- Supported hardware: Architectures
MPIO is supported on all seven architectures: IA32, AMD64/EM64T, IPF/IA64, p-Series (32-bit/64-bit), z-Series (31-bit and 64-bit).
- Supported hardware: Storage subsystems
The multipath-tools package is currently aware of the following storage subsystems:
- 3Pardata VV
- Compaq HSV110 / MSA1000
- DDN SAN MultiDirector
- DEC HSG80
- EMC CLARiiON CX
- FSC CentricStor
- HP HSV110 / A6189A / Open-
- Hitachi DF400 / DF500 / DF600
- IBM 3542 / ProFibre 4000R
- NETAPP
- SGI TP9100 / TP9300 / TP9400 / TP9500
- STK OPENstorage DS280
- SUN StorEdge 3510 / T4
In general, most other storage subsystems should work; however, the ones above will be detected automatically. Others might require an appropriate entry in the /etc/multipath.conf devices section.
Storage arrays which require special commands on fail-over from one path to the other, or require special non-standard error handling, might require more extensive support; however, the DM framework has hooks for hardware handlers, and one such handler for the EMC CLARiiON CX family of arrays is already provided.
- Hardware support: Host bus adapters
- Qlogic
- Emulex
- LSI
In general, all Fibre Channel / SCSI cards should work, as our MPIO implementation is above the device layer.
- Supported software configurations summary
Currently, DM MPIO is not available for either the root or the boot partition, as the boot loader does not know how to handle MPIO.
All auxiliary data partitions such as /home or application data can be placed on an MPIO device.
LVM2 is supported on top of DM MPIO. See the setup notes.
Partitions are supported in combination with DM MPIO, but have limitations. See the setup notes.
Software RAID on top of DM MPIO is also supported; however, note that auto-discovery is not available and that you will need to setup /etc/raidtab (if using raidtools) or /etc/mdadm.conf (if using mdadm) correctly.
3. Installation notes
- Software installation
Upgrade a system to SLES9 SP2 level (or more recent) and install the multipath-tools package.
- Changing system configuration
Using an editor of your choice, within /etc/sysconfig/hotplug set this value:
HOTPLUG_USE_SUBFS=noIn addition to the above change, please configure the system to automatically load the device drivers for the controllers the MPIO devices are connected to within the INITRD. The boot scripts will only detect MPIO devices if the modules for the respective controllers are loaded at boot time. To achieve this, simply add the needed driver module to the variable INITRD_MODULES within the file /etc/sysconfig/kernel.
Example:
Your system contains a RAID controller that is accessed by the cciss driver and you are using ReiserFS as a filesystem. The MPIO devices will be connected to a Qlogic controller accessed by the driver qla2xxx, which is not yet configured to be used on this system. The mentioned entry within /etc/sysconfig/kernel will then probably look like this:
INITRD_MODULES="cciss reiserfs"Using an editor, you would now change this entry:
INITRD_MODULES="cciss reiserfs qla2xxx"When you have applied this change, you will need to recreate the INITRD on your system to reflect it. Simply run this command:
mkinitrdWhen you are using GRUB as a bootmanager, you do not use to make any further changes. Upon the next reboot the needed driver will be loaded within the INITRD. If you are using LILO as bootmanager, please remember to run it once to update the boot record.
- Configuring multipath-tools
If your system is one of those listed above, no further configuration should be required.
You might otherwise have to create /etc/multipath.conf (see the examples under /usr/share/doc/packages/multipath-tools/) and add an appropriate devices entry for your storage subsystem.
One particularly interesting option in the /etc/multipath-tools.conf file is the "polling_interval" which defines the frequency of the path checking that can be configured.
Alternatively, you might choose to blacklist certain devices which you do not want multipath-tools to scan.
You can then run:
multipath -v2 -dto perform a 'dry-run' with this configuration. This will only scan the devices and print what the setup would look like.
The output will look similar to:
3600601607cf30e00184589a37a31d911 [size=127 GB][features="0"][hwhandler="1 emc"] \_ round-robin 0 [first] \_ 1:0:1:2 sdav 66:240 [ready ] \_ 0:0:1:2 sdr 65:16 [ready ] \_ round-robin 0 \_ 1:0:0:2 sdag 66:0 [ready ] \_ 0:0:0:2 sdc 8:32 [ready ]showing you the name of the MPIO device, its size, the features and hardware handlers involved, as well as the (in this case, two) priority groups (PG). For each PG, it shows whether it is the first (highest priority) one, the scheduling policy used to balance IO within the group, and the paths contained within the PG. For each path, its physical address (host:bus:target:lun), device nodename and major:minor number is shown, and of course whether the path is currently active or not.
Paths are grouped into priority groups; there's always just one priority group in active use. To model an active/active configuration, all paths end up in the same group; to model active/passive, the paths which should not be active in parallel will be placed in several distinct priority groups. This normally happens completely automatically on device discovery.
- Enabling the MPIO components
Now run
/etc/init.d/boot.multipath start /etc/init.d/multipathd startas user root. The multipath devices should now show up automatically under /dev/disk/by-name/; the default naming will be the WWN of the Logical Unit, which you can override via /etc/multipath.conf to suit your tastes.
Run
insserv boot.multipath multipathdto integrate the multipath setup into the boot sequence.
From now on all access to the devices should go through the MPIO layer.
- Querying MPIO status
To query the current MPIO status, run
multipath -lThis will output the current status of the multipath maps in a format similar to the command already explained above:
3600601607cf30e00184589a37a31d911 [size=127 GB][features="0"][hwhandler="1 emc"] \_ round-robin 0 [active][first] \_ 1:0:1:2 sdav 66:240 [ready ][active] \_ 0:0:1:2 sdr 65:16 [ready ][active] \_ round-robin 0 [enabled] \_ 1:0:0:2 sdag 66:0 [ready ][active] \_ 0:0:0:2 sdc 8:32 [ready ][active]However, it includes additional information about which priority group is active, disabled or enabled, as well as for each path whether it is currently active or not.
- Tuning the fail-over with specific HBAs
HBA timeouts are typically setup for non-MPIO environments, where longer timeouts make sense - as the only alternative would be to error out the IO and propagate the error to the application. However, with MPIO, some faults (like cable failures) should be propagated upwards as fast as possible so that the MPIO layer can quickly take action and redirect the IO to another, healthy path.
For the QLogic 2xxx family of HBAs, the following setting in /etc/modprobe.conf.local is thus recommended:
options qla2xxx qlport_down_retry=1 ql2xfailover=0 ql2xretrycount=5- Managing IO in error situations
In certain scenarios, where the driver, the HBA or the fabric experiences spurious errors,it is advisable that DM MPIO is configured to queue all IO in case of errors leading loss of all paths, and never propagate errors upwards.
This can be achieved by setting
defaults { default_features "1 queue_if_no_path" }in /etc/multipath.conf.
As this will lead to IO being queued forever, unless a path is reinstated, make sure that multipathd is running and works for your scenario. Otherwise, IO might be stalled forever on the affected MPIO device, until reboot or until you manually issue a
dmsetup message 3600601607cf30e00184589a37a31d911 0 fail_if_no_path(substituting the correct map name), which will immediately cause all queued IO to fail. You can reactivate the queue if no path feature by issueing
dmsetup message 3600601607cf30e00184589a37a31d911 0 queue_if_no_pathYou can also use these two commands to switch between both modes for testing, before committing the command to your /etc/multipath.conf.
4. Using the MPIO devices
- Using the whole MPIO devices directly
If you want to use the whole LUs directly (if for example you're using the SAN features to partition your storage), you can simply use the /dev/disk/by-name/xxx names directly for mkfs, /etc/fstab, your application, etc.
- Using LVM2 on top of the MPIO devices
To make LVM2 recognize the MPIO devices as possible Physical Volumes (PVs), you will have to modify /etc/lvm/lvm.conf. You will also want to modify it so that it does not scan and use the physical paths, but only accesses your MPIO storage via the MPIO layer.
Thus, change the "filter" entry in lvm.conf as follows and add the types extension to make LVM2 recognize them:
filter = [ "a|/dev/disk/by-name/.*|", "r|.*|" ] types = [ "device-mapper", 1 ]This will allow LVM2 to only scan the by-name paths and reject everything else. (If you are also using LVM2 on non-MPIO devices, you will of course need to make the necessary adjustments to suit your setup.)
You can then use pvcreate and the other LVM2 commands as usual on the /dev/disk/by-name/ path.
- Partitions on top of MPIO devices
It is not currently possible to partition the MPIO devices themselves. However, if the underlying physical device is partitioned, the MPIO device will reflect those partitions and the MPIO layer will provide /dev/disk/by-name/>name<p1 ... pN devices so you can access the
The following steps are currently supported in System x® only.
The following steps are needed once you have completed the installation and your system has rebooted.
# chkconfig boot.multipath on # chkconfig multipathd on
The default SLES 11 installation does not create the /etc/multipath.conf file. See the /usr/share/doc/packages/multipath-tools/ directory for more information. In that directory, refer to the multipath.conf.synthetic template and the multipath.conf.annotated HOWTO. In the test environment, the multipath.conf.synthetic file was copied to the /etc/multipath.conf file. To do so, enter the following command:
# cp /usr/share/doc/packages/multipath-tools\ /multipath.conf.synthetic /etc/multipath.conf
All entries in the example file are commented. You can change the values in this file if necessary for your environment.
# iscsiadm -m discovery -t sendtargets -p 192.168.1.22:3260 192.168.1.152:3260,1001 iqn.1992-08.com.netapp:sn.84183797 192.168.1.22:3260,1000 iqn.1992-08.com.netapp:sn.84183797 # iscsiadm -m discovery -t sendtargets -p 192.168.1.152:3260 192.168.1.152:3260,1001 iqn.1992-08.com.netapp:sn.84183797 192.168.1.22:3260,1000 iqn.1992-08.com.netapp:sn.84183797
# iscsiadm -m node -p 9.47.69.22:3260 -T iqn.1992-08.com.netapp:sn.84183797 \ -o update -n node.startup -v onboot # iscsiadm -m node -p 9.47.67.152:3260 -T iqn.1992-08.com.netapp:sn.84183797 \ -o update -n node.startup -v onboot
# iscsiadm -m node -p 192.168.1.22:3260 -T iqn.1992-08.com.netapp:sn.84183797 -o update -n node.conn\[0\].startup -v onboot # iscsiadm -m node -p 192.168.1.152:3260 -T iqn.1992-08.com.netapp:sn.84183797 -o update -n node.conn\[0\].startup -v onboot
# ls -d /sys/block/sd* /sys/block/sda /sys/block/sdb
# iscsiadm -m node --login Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.84183797, portal: 192.168.1.152,3260] Logging in to [iface: default, target: iqn.1992-08.com.netapp:sn.84183797, portal: 192.168.1.22,3260] Login to [iface: default, target: iqn.1992-08.com.netapp:sn.84183797, portal: 192.168.1.152,3260]: successful iscsiadm: Could not login to [iface: default, target: iqn.1992-08.com.netapp:sn.84183797, portal: 192.168.1.22,3260]: iscsiadm: initiator reported error (15 - already exists)
# ls -d /sys/block/sd* /sys/block/sda /sys/block/sdb /sys/block/sdc /sys/block/sdd
IMPORTANT !!! Create a backup copy of your initrd file IMPORTANT !!!
# diff -Nau /etc/sysconfig/kernel.orig /etc/sysconfig/kernel --- /etc/sysconfig/kernel.orig 2009-08-17 18:46:59.000000000 -0400 +++ /etc/sysconfig/kernel 2009-07-29 13:39:00.000000000 -0400 @@ -7,7 +7,7 @@ # ramdisk by calling the script "mkinitrd" # (like drivers for scsi-controllers, for lvm or reiserfs) # -INITRD_MODULES="processor thermal fan jbd ext3 edd" +INITRD_MODULES="dm-multipath processor thermal fan jbd ext3 edd" ## Type: string ## Command: /sbin/mkinitrd
# mkinitrd
# mount /dev/dm-3 on / type ext3 (rw,acl,user_xattr) /proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) debugfs on /sys/kernel/debug type debugfs (rw) udev on /dev type tmpfs (rw) devpts on /dev/pts type devpts (rw,mode=0620,gid=5) fusectl on /sys/fs/fuse/connections type fusectl (rw) securityfs on /sys/kernel/security type securityfs (rw)
# ls -l /dev/dm-3 brw-rw---- 1 root disk 253, 3 Jul 26 16:13 /dev/dm-3
The following output shows that /dev/dm-3 is a linear mapping (partition) on DM devices whose major number is 253 and minor number is 0:
# dmsetup table --major 253 --minor 3 0 9092790 linear 253:0 1381590
# dmsetup info -c -o name,major,minor --major 253 --minor 0 Name Maj Min 360a98000686f68656c6f51645736374f 253 0
# multipath -ll 360a98000686f68656c6f51645736374f 360a98000686f68656c6f51645736374f dm-0 NETAPP,LUN [size=5.0G][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=4][active] \_ 0:0:0:0 sda 8:0 [active][ready] \_ 1:0:0:0 sdc 8:32 [active][ready]
Parent topic: Installing Linux
When using a MPIO device, always use the devices listed under /dev/disk/by-name/. These devices will never change when a path is failed over. Strange behavior will occur if a traditional SCSI device node (e.g., /dev/sdc) is used.As an example, a proper device node is /dev/disk/by-name/3600601607cf30e00184589a37a31d911. If this device has a partition on it the first partition is /dev/disk/by-name/3600601607cf30e00184589a37a31d911p1. The identifier comes from the WWID of the device.
Partitioning
Partitioning a MPIO device is described in Chapter 5.4.2 and 5.11 of the Storage Administration guide at:http://www.novell.com/documentation/sles10/stor_evms/data/multipathing.html
On SLES 10 SP2, please follow the steps below:
Create a partition table for the device by entering
fdisk /dev/dm-8Add a /dev/dm-* link for the new partition by entering
/sbin/kpartx -a -p -part /dev/dm-8Verify that the link was created by entering
ls -lrt /dev/dm-*
LUNs are not seen by the driverlsscsican be used to check whether the SCSI devices are seen correctly by the OS. When the LUNs are not seen by the HBA driver, check the zoning setup of the SAN. In particular, check whether LUN masking is active and whether the LUNs are correctly assigned to the server.
LUNs are seen by the driver, but there are no corresponding block devices
When LUNs are seen by the HBA driver, but not as block devices, additional kernel parameters are needed to change the SCSI device scanning behavior, e.g. to indicate that LUNs are not numbered consecutively. Refer to TID 3955167, Troubleshooting SCSI (LUN) scanning issues for details.
General
SLES
This guide provides information about how to manage storage devices on a SUSE® Linux Enterprise Server 10 Support Pack 2 server with an emphasis on using the Enterprise Volume Management System (EVMS) 2.5.5 or later to manage devices. Related storage administration issues are also covered as noted below.
- Section 1.0, Overview of EVMS
- Section 2.0, Using EVMS to Manage Devices
- Section 3.0, Using UUIDs to Mount Devices
- Section 4.0, Managing EVMS Devices
- Section 5.0, Managing Multipath I/O for Devices
- Section 6.0, Managing Software RAIDs with EVMS
- Section 7.0, Managing Software RAIDs 6 and 10 with mdadm
- Section 8.0, Resizing Software RAID Arrays with mdadm
- Section 9.0, Installing and Managing DRBD Services
- Section 10.0, Troubleshooting Storage Issues
01-11-2008
You want to combine several physical network cards into a virtual one. This article does not describe all of the possible options and features of bonding but simply explains how to set it up on Novell Linux products. For additional information on bonding itself, please refer to the file /usr/src/linux/Documentation/networking/bonding.txt , provided the kernel sources are installed on your system, or visit the home page of the bonding project.
- up2date device-mapper-multipath
- Edit /etc/multipath.conf
For detailed information, see: "SAN Persistent Binding and Multipathing in the 2.6 Kernel"- modprobe dm-multipath
- modprobe dm-round-robin
- service multipathd start
- multipath -v2
Will show multipath luns and groups. Look for the multipath group number, this is the dm-# listed in /proc/partitions. The multipath lun is accessed via the /dev/dm-# device entry.- Format each SCSI DEVICE:
- sfdisk /dev/sdX
- (Optional) Create multipath devices for each partition:
(not needed if using LVM, since you will just mount the logical volume device)
- kpartx -a /dev/dm-#
- Enable multipath to start on bootup:
- chkconfig multipathd on
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: March 12, 2019