LVM incomplete support

Issue #23 open
Dae Quan created an issue

Hi,

This project looks promising. I'm testing it for use with LVM, but have run into some problems. However, they are easy to fix (fortunately). So I hope you want to do it:

  • When using advanced LVM scenarios, the current setup fails. One advanced scenario is when you use MIRROR Logical Volumes (RAID-1). In this case you need to load the DM-RAID module. The good news is that all modules are present. And after a simple “modprobe dm-raid” command you can start to use it. However, in the boot process this fails. I’ve prepared a simple patch for the “/lib/lightwhale/setup-persistence” script that fixes this behaviour:

    shell info "Activating any LVMs ..." + local types=$(fdisk -l | grep -wE 'Linux LVM|Linux RAID') || fail "FAILED to get partitions" + if [ -n "$types" ]; then + info "Loading RAID modules" + modprobe -s dm-raid >&2 || true + sleep 1 # Necessary to get time to load dependencies + else + info "No modules needed for RAID/LVM" + fi vgchange -aay >&2 || true

    As you can see, the change is simple: load the module at boot time. But it will only be done if LVM/RAID is detected. And note that you have to wait 1 second after loading the module, because it needs to load all dependencies. This is because if LVM detection is called before this, then any mirrored volume will not be started.

  • One must-have cause to use LVM is to resize/move volumes. At time all elements are inside the distribution except the necessary “resize2fs” tool from the “fstools” package. Without it we can’t expand/shrink the persistent storage, that is the main idea when using LVM volumes. Therefore, I suggest to add this necessary tool to manage the volumes.

  • The last question is a suggestion to change the creation of the persistent storage. At time if you write the label “lightwhale-please-format-me“ in one Logical Volume, then you’re partitioning the volume. And inside it, you create two partitions (one for data and one for swap). And this is true if you don’t enable swap. Therefore, I suggest to support these extended labels to simply the creation (we can do it manually, but I prefer to follow the guides):

    none "lightwhale-please-format-me" : as now creates partition(s) and format it/them "lightwhale-please-format-me|no-partition|data" : only format it for data "lightwhale-please-format-me|no-partition|swap" : only format it for swap "lightwhale-please-format-me|data" : one partition plus format for data "lightwhale-please-format-me|swap" : one partition plus format for swap

    The objective is to have the data filesystem inside the logical volume, and not inside a partition inside the volume. This complexity is unnecessary.

I hope you want to add these enhancements.
Thank you.

Comments (21)

  1. Stephan Henningsen

    Hi, there!

    Thanks for checking out Lightwhale, and even more so for taking your time to submit this detailed description =)

    Lightwhale does needs some attention on its LVM side. I haven’t had the need and time to look into LVM myself, so this area is entirely driven by input from the community like yours. So I need to dust off my LVM-fu before diving into this.

    I have a few comments, let me address your points individually:

    Load RAID module and dependencies

    Instead of parsing textual output from fdisk, I prefer blkid or lsblk if possible, since they provide output that’s directly intended for parsing in scripts. Also, sleep 1 seem too shaky, but I could use some help for a better alternative. For example, will modprobe really exist before its dependencies have been loaded? Maybe lsmod only lists dm-raid when its deps are all fully loaded? Or is there a particular dependency module I could check on instead? Do you think I can use modprobe --show-depends dm-raid and either insmod each of these plus dm-raid itself?

    Add resize2fs

    I’ll add this, of course. Thanks for pointing this out. This is the ext2/3/4 tool, right? I’ve actually considered to switch to btrfs by default, but still support ext4, so resize2fs would still be relevant in any case.

    Format persistence partition

    I like your suggestion. I’ve been thinking of doing something similar to support different persistence configurations, but more in the line of lightwhale-please-format-me+swap+lvm where each +flag is optional. I’ll need to think about this and fully understand the LVM use-case and the LVM layout.

  2. Dae Quan reporter

    Hi Stephan! Thank you for your kind reply.

    • “resize2fs”: nice if you add it. It works with 2, 3 and 4 versions. And online with kernel support (I’ll check it when you add it).
    • “persistence partition”: Any approach you would like to apply would be welcome. However, I prefer to use : instead of + because perhaps the plus sign indicates to add it. And this is not case: the idea is to “reduce options” (no partition, no swap, no data). Right?
    • “raid modules”: I’ve do a lot of tests. The root problem is that the udev is not working in the early stages of the boot. Therefore tools like lsblkdon’t show the filesystems. And for this reason I’m using the fdisk tool (that’s universal). One solution is to load the raid modules every time. But perhaps you (and other users) will think that this is not necessary. And related to the 1 second delay, the problem is that the modprobe call returns before all modules are registrered. If found some comments about this (as a race condition) in the boot of systems with LVM and RAID. So at time it’s a no go way. The only solution is to add this delay. Futhermore, the modprobe --show-depends don’t solve the problem, because a lot more modules are necessary. So at end the only practical solution is modprobe dm-raid; sleep 1; vgchange -aay. But anyway it works perfectly and doesn’t add much time. At time I’m using it without troubles.

    Best.

  3. Dae Quan reporter

    Hi,

    Another incomplete support:

    • Missing “dm-cache” modules: I want to configure a stacked LVM volume for persistence. And the command lvcreate --type cache-pool ...fails because all related modules with “dm-cache” are missing. So please could you add them?

    Thank you.

  4. Dae Quan reporter

    Hi,

    Regarding the “dm-cache” missing modules, after a check, you only need to add to `lightwhale/custom/board/pc/linux.config`:

    CONFIG_DM_CACHE=m
    

    I feel you've missing it, because the CONFIG_DM_WRITECACHE=m is already in the config.

    Please, add it and share a RC with it. Then I’ll check it.
    Best.

  5. Dae Quan reporter

    Hi,

    Thinking about a possible rescue, I think it might be interesting to add a new boot parameter similar to nodata. In this case nolvm or noraid. The idea is to start the persistent area, but without enabling any LVM and/or RAID modules. Then you can boot without touching any problematic space. And if your data storage is inside a regular partition, then you can run the lvm/raid tools.

    You agree with that idea?

  6. Dae Quan reporter

    Hi,

    I’m doing more tests, with a new virtual machine. And now I see that if I execute the command echo "lightwhale-please-format-me" | sudo dd conv=notrunc of=/dev/sdb in a system without any partition (/dev/sda is reserved), the setup-persistence never formats the device. Could you please check this?

  7. Dae Quan reporter

    Hi Stephan,

    The previous message regarding "lightwhale-please-format-me" not working is not true. The problem is different: if the controller device is not loaded before the setup-persistence the disks are not accessible. This is true, for example, if you use vmware virtual machines using the pvscsi controller. In this case the disk is attached after. So it’s a problem similar to this change: https://bitbucket.org/asklandd/lightwhale/commits/cc240605bb5955f1bdcf9b9d0e4453da2a46707d

    The solution is to connect the disks using the SATA controller. However, perhaps you want to consider to change the “vmw_pvscsi” driver from module to kernel as well. Or add some logic to setup all controllers before execute the “setup-persistence” script.

    Best.

  8. Stephan Henningsen

    There’s a lot going on here now.

    The change the “vmw_pvscsi” driver from module to built-into-kernel deserves a separate issue.

    The add “dm-cache” module has been addressed in the 2.1.5-dev1 build.

    The add resize2fs as been addressed in the 2.1.5-dev1 build.

    The lightwhale-please-format-me|extra+parameters is maybe but not necessarily related to the original issue, which is in fact:

    LVM+RAID incomplete support

    I’d really like to focus on this.

    Lightwhale already has RAID support, but without LVM, and I think it would be valuable to add LVM. My problem is that I don’t know how to it properly.

    I’ve created a little personal Lightwhale+LVM+RAID-HOWTO document for keeping notes while working on this. It’s by no means finish, which is why I need help =) Perhaps you can help me fill in the blanks in my understanding of LVM and RAID?

    Should I assemble the RAID with mdadm before creating the PV, VG, and LVs? I’ve research some, but there appears to be various ways to mix and match these concepts.

    My HOWTO is basically a list of commands that can be copied and pasted into a Linux terminal. The idea is to setup LVM+RAID in Lightwhale running in QEMU. The QEMU console can be a bit of a pain to paste anything into, so instead ssh into Lightwhale in QEMU from a normal terminal, and everything works out fine.

    Here’s parts of what I’ve been working on so far:

    # Create two disk images of equal size.
    dd if=/dev/zero of=persistence-lvm-raid-1.img bs=1M count=256
    dd if=/dev/zero of=persistence-lvm-raid-2.img bs=1M count=256
    
    # Start Lightwhale with two disk images, ssh is portforwarded on port 10022.
    qemu-system-x86_64 \
        -display sdl,show-cursor=on,window-close=on,gl=on -vga std \
        -enable-kvm -cpu host -m 2G \
        -device virtio-rng-pci \
        -net nic,model=virtio-net-pci -net user,hostfwd=tcp::10022-:22 \
        -drive file=lightwhale-2.1.5-dev1-x86.iso,index=0,format=raw,media=cdrom -device sdhci-pci -boot d \
        -drive file=persistence-lvm-raid-1.img,index=1,format=raw,media=disk,id=persistence-lvm-raid-1 \
        -drive file=persistence-lvm-raid-2.img,index=2,format=raw,media=disk,id=persistence-lvm-raid-2
    
    # SSH into QEMU from terminal to enable further copy/paste of commands.
    ssh-keygen -R [lightwhale.localhost]:10022  # Remove any existing keys, ie. if QEMU was restated.
    ssh -A op@lightwhale.localhost -p 10022 -o StrictHostKeyChecking=false  # password=opsecret
    
    # Identify disk devices: 2 × 256 MB
    lsblk
    
    # Initialize physical volumes (PV)
    sudo pvcreate /dev/sda /dev/sdb
    
    # Create volume group (VG)
    sudo vgcreate lightwhale-vg /dev/sda /dev/sdb
    
    # Activate volume group
    sudo vgchange --activate y lightwhale-vg
    
    # Create logical volumes (LV) for data and swap.
    sudo modprobe dm-raid raid1
    
    
    ## This is where the HOWTO gets messy!
    # This succeeds, but does it really create a RAID?  Can I pull out /dev/sda and/or add /dev/sdc?
    sudo lvcreate --type raid1 --mirrors 1 --name lightwhale-lv-swap --size    80M      lightwhale-vg
    sudo lvcreate --type raid1 --mirrors 1 --name lightwhale-lv-data --extents 100%FREE lightwhale-vg
    
    # This fails with errors, but some tutorial argue that creating the LV on a specific device is the way to go for RAID.
    sudo lvcreate --type raid1 --mirrors 1 --name lightwhale-lv-swap1 --size    80M      lightwhale-vg /dev/sda
    sudo lvcreate --type raid1 --mirrors 1 --name lightwhale-lv-data1 --extents 100%FREE lightwhale-vg /dev/sda
    sudo lvcreate --type raid1 --mirrors 1 --name lightwhale-lv-swap2 --size    80M      lightwhale-vg /dev/sdb
    sudo lvcreate --type raid1 --mirrors 1 --name lightwhale-lv-data2 --extents 100%FREE lightwhale-vg /dev/sdb
    
    
    # Create file systems.
    sudo mkswap     --label lightwhale-swap /dev/lightwhale-vg/lightwhale-lv-swap
    sudo mkfs.btrfs --label lightwhale-data /dev/lightwhale-vg/lightwhale-lv-data
    
    # Mount file systems
    sudo swapon /dev/lightwhale-vg/lightwhale-lv-swap
    swapon
    
    mkdir /home/op/data
    sudo mount -o defaults,rw,noatime,compress=zstd,ssd,discard=async,space_cache=v2 /dev/lightwhale-vg/lightwhale-lv-data /home/op/data
    mount
    
    # Put some data on the file system.
    cp -av /proc/config.gz ~/data/
    
    # Test: Remove physical disk from RAID.
    TODO
    

  9. Dae Quan reporter

    Hi Stepahn,

    Yes, I agree: we can leave for other topics some of the issues (such as pvscsi initialisation and the configuration script). So regarding the LVM support:

    First of all, at the present day we can use LVM in several ways. It can be used alone, stacked with MD (on top, or under), and combined with MD. Time ago it has common to use it in combination with MD-RAID (with the mdadm tool) with LVM on top of it. However, that’s not the case today. The cause has because time ago the LVM has support to be combined with MD and used it (as an internal library). Then you don’t need to use the mdadm tool nor any RAID partition. That’s the current state-of-art of the LVM and very reliable. However, you need to know that LVM2 has support for more than one implementation of the same technology. Therefore you can configure it in different forms. So, I suggest to use only:

    • LVM stanalone (no raid, no cache).
    • LVM with internal DM-RAID support.
    • LVM with internal DM-CACHE support with cache-pools (and not cache-fs).
    • or with a combination of the last two.

    Based on this, I propose to reduce your guide to these commands (I remove the not specific to LVM):

    # Identify disk devices: 2 × 256 MB
    lsblk
    
    ## First create the partitions: it's a bad practice to put anything in a disk without an identification!
    sudo parted --align optimal /dev/sda --script -- mktable gpt
    sudo parted --align optimal /dev/sda --script -- mkpart lightwhale-vg-lvm 0% 100%
    sudo parted --align optimal /dev/sda --script -- set 1 lvm on
    
    sudo parted --align optimal /dev/sdb --script -- mktable gpt
    sudo parted --align optimal /dev/sdb --script -- mkpart lightwhale-vg-lvm 0% 100%
    sudo parted --align optimal /dev/sdb --script -- set 1 lvm on
    
    # Initialize physical volumes (PV) <-- in the corresponding partitions
    sudo pvcreate /dev/sda1 /dev/sdb1
    
    # Create volume group (VG) (only disk 1, we will add the second after)
    sudo vgcreate lightwhale-vg /dev/sda1
    
    # Activate volume group
    sudo vgchange --activate y lightwhale-vg
    
    # Load modules
    sudo modprobe dm-raid   # Not necessary "raid1" because all submodules will be loaded!
    
    # Create one logical volumes (LV) in RAID-1
    # Point 1: Create regular logical volume
    sudo lvcreate -l 100%FREE -n lightwhale-data lightwhale-vg
    
    # Point 2: Leave space for the mirror log metadata (you can leave more than block, but 1 could be sufficient)
    sudo lvreduce -l -1 /dev/lightwhale-vg/lightwhale-data
    
    # Point 3: Add the second disk to the volume group
    sudo vgextend /dev/lightwhale-vg /dev/sdb1
    
    # Point 4: Mirror the volume (convert it to RAID-1 using MD internally) 
    sudo lvconvert -m1 /dev/lightwhale-vg/lightwhale-data /dev/sdb1
    
    # That's all, you can check the status with commands like:
    sudo pvdisplay
    sudo vgs
    sudo lvs -a
    
    # Create file systems.
    sudo mkfs.btrfs --label lightwhale-data /dev/lightwhale-vg/lightwhale-data
    
    ## Note 1: I don't recommend to put SWAP in LVM. It's preferable to put it on a regular physical partition. RAID is not necessary. And you can add multiple SWAP spaces.
    
    ## Note 2: I don't recommend to use BTRFS with LVM. The EXT4 is good with LVM (stacked with NVME+SSD+HDD). Use internal BTRFS functionalities for mirroring and caching.
    
    mkdir /home/op/data
    sudo mount -o defaults,rw,noatime,compress=zstd,ssd,discard=async,space_cache=v2 /dev/lightwhale-vg/lightwhale-data /home/op/data
    mount
    
    # Put some data on the file system.
    cp -av /proc/config.gz ~/data/
    
    # Test: Shutdown LVM.
    vgchange -an
    

    I hope you can do more tests.

    And for CACHE I’ll prepare another guide. You need to have support to load modules with modprobe dm-cache

    Best.

  10. Stephan Henningsen

    Thank a lot of the detailed description including the HOWTO and the history of LVM and RAID! That explains why every other tutorial I’ve found does things differently.

    Please allow me some time to consolidate this.

  11. Stephan Henningsen

    I have a few quick questions:

    Wouldn’t it be better to run plain BTRFS on top of the gpt partitions? It’s said that BTRFS performs better on raw metal than with LVM stuffed in-between. And BTRFS also handles RAID itself. It also handles writing across more physical disks, much like a LVM VG. And it has a lot of other advantages over at least ext4.

    What’s the benefit of LVM, expect being able to any file system of choice on top of it? Is there a use-case where it shines, maybe when running Lw in a virtual environment?

    See: https://fedoramagazine.org/choose-between-btrfs-and-lvm-ext4/

  12. Dae Quan reporter

    Hi,

    At present day LVM+RAID+EXT4 is very similar to BTRFS. Threfore, if you want to use BTRFS don’t mix up with LVM+RAID, and use it standalone. However, remember to use every time disk partitions and not the whole disk. Anyway in my case I prefer LVM+RAID+EXT4 for two reasons: more robust [aka simple] recovery tools (ext4); and more transparent LV migrations (completly independent of the filesystem).

    My suggestion is then to support both.

  13. Stephan Henningsen

    I’m still working on this btw ;) It’s just that I don’t have very much spare time these days, so progress is slow.

  14. Dae Quan reporter

    Great. Don't worry and take your time. However, you're welcome if you explain five cents of what you are implementing.

  15. Stephan Henningsen

    Hi, Dae Quan!

    I've refactored persistence, and it's starting to look like it's working.

    Not much has changed on the surface: the data partition is still mounted at /mnt/lightwhale-data, but instead of writing directly to the root of this mount point, Lightwhale now stores its save state in /mnt/lightwhale-data/lightwhale-save. This allows Lightwhale to "adopt" an existing disk while keeping its data isolated to a single directory. Btrfs is the default filesystem, and ext4 + LVM + RAID is now officially supported, but...!

    There are still a few things I need to work on:

    1. Migration: The change in the save directory should be seamless for existing installations with persistence, ideally without requiring any changes on disk.
    2. ext4 + LVM: I'm getting an error when building the VG:

      shell Logical volume lightwhale-vg/lightwhale-data successfully resized. Insufficient free space: 965 extents needed, but only 1 available. 3. dm-cache: I haven’t addressed this yet. As I understand it, caching should be on the fastest physical disks. If that’s the case, I'll need to work on detecting which disks are faster, which could be quite complex for a generic solution.

    I've built a developer version of Lightwhale 2.1.5 with the above changes. Do you have time to try it out and help identify any issues? Any help with the issues mentioned in #2 and #3 would be much appreciated!

    Here’s the latest developer build: https://lightwhale.asklandd.dk/dev/lightwhale-2.1.5-dev7-x86.iso

    To test the different new ways for formatting the data partition, use one of these kernel options:

    data-mode=ext4
    data-mode=ext4+swap
    data-mode=ext4+raid1+swap

    data-mode=btrfs
    data-mode=btrfs+swap
    data-mode=btrfs+dup+swap
    data-mode=btrfs+raid1+swap

    Simply tag the magic disk with “lightwhale-please-format-me” as usual.

    Do NOT run this on a production system; migration isn’t implemented yet and may mess up the persistence partition!

  16. Stephan Henningsen

    Hi Dae Quan. I hope you’re doing well. I would like to get in touch with you about completing this issue. If you could look me up on the Lightwhale Discord https://discord.gg/2qJB7VJsaU or by other means, it would be much appreciated.

  17. Log in to comment