Skip to content

Block devices in KVM guests

In the last few days, I found the time to spend some with KVM and libvirt. Unfortunately, there is a subject that I haven't yet found a satisfying solution: Naming of block devices in guest instances.

This is surely a common issue, but solutions are rare. Neither an article on Usenet (in German) nor the German version of this blog article has found solutions for the main question. I should have written this in English in the first place and am thus translating from German to english, hoping that there will be some answers and suggestions.

KVM is quite inflexible when it coms to configure block devices. It is possible to define on the host, which files or whole devices from the host should be visible in the guest. The documentation suggests that devices should be brought into the guest with the virtio model, which needs suppport in the guest kernel. Importing a device as emulated ATA or SCSI device brings a performance penalty.

The devices brought into the guest via virtio appear in the guest's dev as /dev/vd<x> and do also have their corresponding entries in /dev/disk/by-uuid and /dev/disk/by-path. The vd<x> node is simply numbered in consecutive order as hd<x> and sd<x>. /dev/disk/by-uuid is the correct UUID of the file system found on the device, at least if it's a block device partitioned inside the guest and formatted with ext3 (I didn't try anything else yet). The terminology of the /dev/disk/by-path node is not yet understood, and I am somewhat reluctant to assume the PCI paths of emulated hardware as stable.

Looks like I am forced to configure inside the guest when I have changed the host's mass storage system (for example, after migration to a different file system or after new block devices have been added) to accommodate for the new order of the /dev/vd<x> or to have the UUID correctly configured. This is like calling for configuration errors.

This is a reincarnation of the same issue that has been killed by LVM on Linuxes running on "bare metal": The block device itself has a mnemonic name which is constant even in migration and copy actions. This works without file-system specific stuff like UUID or label, and wouldn't even be possible with a UUID (which won't be unique in this case). The mnemonic names of LVs are also available when the data is directly written to the raw device such as a CNFS buffer of a news server.

I would love to have something like a "paravirtualized device mapper interface" which would allow to say in the host's configuration which of the host's LVs should be visible in the guest with which name in /dev/mapper. That way, the guest's configuration could remain stable during data wrangling operations on the host.

One solution is to have a single LV for each guest and import this LV as /dev/vda into the guest. /dev/vda would then be partitioned like a real disk and have its own LVM installation. This would however need kpartx if one wants to access the data from the host, and loses flexibility when resizing file systems.

These issues appear in every installation of KVM virtualization, and I would expect that there are gazillions of other possible solutions. I am interested in knowing how other people have tackled this issue and whether there are more possiblity that I haven't thought of before. Maybe there is a solution that doesn't leave me with the feeling of having implemented something ugly. Does the interface between the host's KVM and the guest's device mapper that I have been dreaming of maybe exist. Or is there any other possibility of configuring the device node's name in the guest Linux on the host side?

Trackbacks

No Trackbacks

Comments

Display comments as Linear | Threaded

Niall on :

kvm-qemu (well qemu really) has support for specifying the index of drives so you can make the guest naming stable from the host side. You can use something like:

kvm -drive file=/foo/bar,boot=on,if=virtio,index=0

I'm afraid I've no idea what libvirt will or won't allow as I lost interest in it once I discovered it wouldn't run the guests as a normal user but instead insisted on your running your virtual machines as root.

Jim on :

Misinformation for the win!

Libvirt does not run virtual machines as root anymore. In Debian, they're all run as libvirt-qemu.

There's no reason to expect that emulated PCI addresses would change. They're emulated, why should they? In fact, patches were posted to the libvirt mailing list lately (and will probably appear in 0.7.6) that will explicitly assign PCI addresses when creating devices, so they are guaranteed stable.

If you provide one disk image (which, on the host, is either a block device or a file) to the guest, it will always show up as /dev/vda. This will not change if you change the host filesystem, and it won't change if you rearrange hard drives on the host. What exactly is the problem? What sorts of operations are you doing on the host that would affect what the guest sees?

Using a single LV or unique block device for each guest makes the most sense from a performance standpoint. Using a disk image on a host filesystem is horribly inefficient. You argue against using LVs because of the difficulty of accessing the guest filesystem from the host, but this should be an extremely rare thing that you need to do, so it's not a big deal. There are also excellent tools like libguestfs that make it trivial.

Marc 'Zugschlus' Haber on :

There’s no reason to expect that emulated PCI addresses would change. They’re emulated, why should they? In fact, patches were posted to the libvirt mailing list lately (and will probably appear in 0.7.6) that will explicitly assign PCI addresses when creating devices, so they are guaranteed stable.

So I can configure with which PCI address a device will appear in the host? That solves part of the issue, but the PCI addresses are cryptic, names are mnemonic.

If you provide one disk image (which, on the host, is either a block device or a file) to the guest, it will always show up as /dev/vda. This will not change if you change the host filesystem, and it won’t change if you rearrange hard drives on the host. What exactly is the problem?

When I make a host LV visible in the guest as /dev/vda, I need to first resize the host LV and then the guest's LV which resides in /dev/vda. That's one step more than necessary. When I make /dev/mapper/guest1-home visible in the guest as /dev/mapper/home, it's onle one resize operation. But this gets bulky when it's root, boot, usr, home and var.

Using a single LV or unique block device for each guest makes the most sense from a performance standpoint.

I'd rather sacrifice some performance for flexibility, which the single-block-device-per-guest doesn't have.

You argue against using LVs because of the difficulty of accessing the guest filesystem from the host, but this should be an extremely rare thing that you need to do, so it’s not a big deal.

I guess it's a very handy way for recovery actions, backup/restore et al, so it's not that exotic. Please note that I'm only arguing against "one-LV-per-guest" and that having "multiple-LVs-per-guest" is actually the way I'd like to go.

There are also excellent tools like libguestfs that make it trivial.

That one is not even in Debian unstable, alas.

I just hate the idea to go back to stupidly consecutively numbered block devices after getting rid of them seven years ago.

Niall on :

I lost interest in libvirt a while ago, and from what I saw at the time I didn't expect the "running as root" problem to have been fixed by now. Good to know it has.

Marc 'Zugschlus' Haber on :

kvm -drive file=/foo/bar,boot=on,if=virtio,index=0

works, the disk comes up as /dev/vda. But any non-zero value for index= means that the device node does not show up in the guest at all.

Doesn't help.

Np237 on :

I have found that LVM is indeed a quite nice solution. One logical volume makes one virtual disk. This works also quite well in high-availability environments.

I have not tried LVM-on-LVM, instead I partition the virtual drive in the old classical way. This means indeed the drive contents cannot be mounted directly in the host, but this is not something that I find extremely useful either.

Marc 'Zugschlus' Haber on :

I find being able to directly mount the guest drive on the host extremely helpful in some recovery and migration cases. I would hate to lose this capability.

Np237 on :

You can always boot the guest with a live CD image. Given the frequency this is needed, I find it not a big deal compared to the flexibility of LVM added to simplicity in the guest OS - including if the guest is non-Linux.

That said, an additional layer on top of LVM to be able to delegate block devices to guests while keeping their names, as you proposed, would definitely be better.

Richard Jones on :

There are also excellent tools like libguestfs that That one is not even in Debian unstable, alas.

You should be able to find it here:

http://pkg-libvirt.alioth.debian.org/packages/unstable/

(I am the author of libguestfs)

Markus Hochholdinger on :

So here is my solution: On the host I use lvm to manage the pool of real disks. So no problem here. For each guest I create one logical volume for the rootfs and one logical volume for swap. The name of the logical volumes are $GUESTNAME and $GUESTNAME-swap. Because I like redundancy I do this on for each guest on two real servers and transport the block device over iscsi to the other real servers. I also have some udev rules to have consistent naming accross real servers, e.g. /dev/xbd/$GUESTNAME.$REALSERVERNAME on each real server is the same, nevertheless it is a local logical volume or a remote logical volume over iscsi. Inside my guest I do raid1 out of the two $GUESTNAME block devices, so I have a stable /dev/md0 as blockdevice for my rootfs and /dev/md1 for my swapspace. I build my raid1 on boot of the guest with kernel parameters "md=0,/dev/vda,/dev/vdb md=1,/dev/vdc,/dev/vdd". In my experiences the order in which the disks are configured, this is the order they apear inside the guest. First drive is vda, second vdb and so on. With libvirt config it is possible to say for each block device which device name it should have (vda, vdb, ..) but it seems the kvm/qemu drive option is without index option when running.

pros one logical volume on the host for each guest ** to make snapshots ** for backups ** for easy recovery ** to have a meaningful name of the logical volume you can grow the filesystem inside the guest WITHOUT downtime of the guest! (remove one part of the raid1, lvresize on the host, reinsert this part of the raid1, do the same for the other side, grow raid1 and grow fs (i do this since 2006 with xen)) you can also grow the swap space without downtime you only have to monitor one filesystem per guest no partition tables inside the guest (who needs this, if all is managed with lvm on the host?)

cons: a few functions for the guest are outside the guest, so this setup is only reasonable if the admin of the host is also a admin of the guest. * only one filesystem (for me, this is a pro, but for other purposes this could be bad)

Add Comment

Markdown format allowed
Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.
E-Mail addresses will not be displayed and will only be used for E-Mail notifications.
Form options