Home

Contents

About

xNBD is yet another NBD (Network Block Device) server program, which works with the NBD client driver of Linux Kernel.

xNBD provides the following features:

  • Possibly better I/O performance by using mmap().
  • Concurrent access from multiple clients.
  • Copy-on-Write (basic support).
  • Snapshot (basic support).
  • Distributed Copy-on-Write.
  • Live storage migration for virtual machines.
  • IPv6 support.

See demo movies to know how it works.

This is an open source project. Your contribution is very welcome. Please send me patches. Doing things in the Mercurial way is a nice idea!

Download

You might be able to take a short cut, i.e., just install it from your distribution. At least, Debian and Ubuntu include xNBD.

apt-get install xnbd-client xnbd-server

You can download source code and follow the latest development.

xNBD is licensed under GPLv2.

Compile

Before compiling xNBD, setup the following libraries and programs.

  • GLib 2. See http://www.gtk.org/ .
  • Linux kernel and glibc with ppoll() system call. 2.6.22 or later is recommended.
  • GCC 4. C99 syntax is used.
  • make, autoconf.

Then, extract an xNBD tarball, and make it.

tar xvf xnbd-x.y.z.tar.gz
cd xnbd-x.y.z
autoreconf -i
./configure
make

To enable debug messages, add --enable-debug option to ./configure.

Usage

The below programs are used in a server node. xnbd-server exports a disk image through the NBD protocol.

  • xnbd-server: the xNBD server program
  • xnbd-bgctl: the control program of the xNBD proxy mode
  • xnbd-wrapper: the super daemon of xnbd-servers, managing multiple disk images and xnbd-server instances.
  • xnbd-wrapper-ctl: the control program of xnbd-wrapper

The below programs are used in a client node. But it is optional. Instead of them, you can still use nbd-client of the original NBD program.

  • xnbd-client: a userland helper program for the NBD driver. It works with the NBD driver in the mainline Linux kernel.
  • xnbd-watchdog: a watchdog program which periodically checks an NBD device is working correctly.

Summary

Command line options may be different in the latest code. See --help outputs of compiled binaries.

xnbd-server

% xnbd-server --help
Usage:
  xnbd-server --target [options] disk_image
  xnbd-server --cow-target [options] base_disk_image
  xnbd-server --proxy [options] remote_host remote_port cache_disk_path cache_bitmap_path control_socket_path
  xnbd-server --help
  xnbd-server --version

Options:
  --lport        listen port (default 8520)
  --daemonize    run as a daemon process
  --readonly     export a disk as readonly in target mode
  --logpath PATH use the given path for logging (default: stderr/syslog)
  --syslog       use syslog for logging
  --inetd        set the inetd mode (use fd 0 for TCP connection)

Options (Proxy mode):
  --clear-bitmap clear an existing bitmap file (default: re-use previous state)

xnbd-client

% ./xnbd-client --help
Usage:
  xnbd-client [bs=...] [timeout=...] host port nbd_device

  xnbd-client --connect [options] nbd_device host port [host port] ...
  xnbd-client -C [options] nbd_device host port [host port] ...

  xnbd-client --disconnect nbd_device
  xnbd-client -d nbd_device

  xnbd-client --check nbd_device
  xnbd-client -c nbd_device

  xnbd-client --help

Options:
  --timeout     set a timeout period (default 0, disabled) (DO NOT USE NOW)
  --blocksize   select blocksize from 512, 1024, 2048, and 4096 (default 1024)
  --retry       set the maximum count of retries to connect to a server (default 1)
  --recovery-command            invoke a specified command on unexpected disconnection
  --recovery-command-reboot     invoke the reboot system call on unexpected disconnection
  --exportname  specify a target disk image

Example:
  xnbd-client fe80::250:45ff:fe00:ab8f%%eth0 8998 /dev/nbd0
     This command line is compatible with nbd-client. xnbd-client supports IPv6.

  xnbd-client --connect /dev/nbd0 fe80::250:45ff:fe00:ab8f%%eth0 8998 10.1.1.1 8900
     This automatically tries the next server if the first one does not accept connection.

xnbd-watchdog

% ./xnbd-watchdog --help
Usage:
  xnbd-watchdog [options] nbd_device

  xnbd-watchdog --help

Options:
  --timeout     set a timeout period (default 10)
  --interval    (default 10)
  --recovery-command            invoke a specified command if polling failed
  --recovery-command-reboot     invoke the reboot system call if polling failed

Example:
  xnbd-watchdog --recovery-command-reboot /dev/nbd0

xnbd-bgctl

% ./xnbd-bgctl --help
Usage:
  xnbd-bgctl                     --query       control_unix_socket
  xnbd-bgctl [--force]           --switch      control_unix_socket
  xnbd-bgctl [--progress]        --cache-all   control_unix_socket
  xnbd-bgctl                     --cache-all2  control_unix_socket
  xnbd-bgctl [--exportname NAME] --reconnect   control_unix_socket remote_host remote_port

Commands:
  --query       query current status of the proxy mode
  --cache-all   cache all blocks
  --cache-all2  cache all blocks with the background connection
  --switch      stop the proxy mode and start the target mode
  --reconnect   reconnect the forwarding session

Options:
  --exportname NAME  reconnect to a given image
  --progress         show a progress bar on stderr (default: disabled)
  --force            ignore risks (default: disabled)

xnbd-wrapper

% ./xnbd-wrapper --help
Usage:
  ./xnbd-wrapper [options]

Options:
  --daemonize    run wrapper as a daemon process
  --cow          run server instances as a cow target
  --readonly     run server instances as a readonly target
  --laddr        listening address
  --lport        listening port (default: 8520)
 (--port)        deprecated, use --lport instead
  --xnbd-bgctl   path to the xnbd-bgctl executable
  --xnbd-server  path to the xnbd-server executable
  --imgfile      path to a disk image file. This options can be used multiple times.
                 Use also xnbd-wrapper-ctl to (de)register disk images dynamically.
  --logpath PATH use the given path for logging (default: stderr/syslog)
  --socket       unix socket path to listen on (default: /var/run/xnbd-wrapper.ctl)
  --syslog       use syslog for logging

Examples:
  xnbd-wrapper --imgfile /data/disk1
  xnbd-wrapper --imgfile /data/disk1 --imgfile /data/disk2 --xnbd-binary /usr/local/bin/xnbd-server --xnbd-bgctl /usr/local/bin/xnbd-bgctl --laddr 127.0.0.1 --port 18520 --socket /tmp/xnbd-wrapper.ctl

Step-by-Step Examples

Install the NBD driver (nbd.ko) and client program (nbd-client or xnbd-client) into your client node. The NBD driver is included in Linux kernel, and enabled in most distributions.

Scenario 1 (Simple target server)

images/target.png

In a server node (10.1.1.1), start an xNBD server exporting a local file (disk.img). The server listens on TCP port 8992.

dd if=/dev/zero of=disk.img bs=4096 count=1 seek=1000000
xnbd-server --target --lport 8992 disk.img

In this case, we have created a new image file of 4GB.

In a client node, establish an NBD session to the above server.

modprobe nbd
echo deadline > /sys/block/nbd0/queue/scheduler
nbd-client bs=4096 10.1.1.1 8992 /dev/nbd0

The NBD client driver says that you may need to explicitly select the deadline I/O scheduler in order to avoid deadlock. In addition, to get better performance, set the block size of nbd0 to 4096, which is the same value of the cache block size inside xNBD. This is optional.

If you need concurrent access from multiple clients, repeat the same operations at each client node. In normal cases, you need a cluster file system (e.g., OCFS2 and GFS) to store data safely into a shared disk.

Scenario 2 (Simple proxy server, distributed Copy-on-Write)

xNBD can also work as a proxy server to another target server. This feature is used for distributed Copy-on-Write NBD disks; one read-only disk image is shared among multiple clients, and updated disk data is saved at each proxy.

images/proxy.png

In the proxy server mode of xNBD, all I/O requests are intercepted, and redirected to the target server if needed. All updated blocks are saved at the proxy server, and read blocks are also cached. Writes do not happen at the target server.

Now, start an xNBD proxy server (10.255.255.254:8992) redirecting to the above target server (10.1.1.1:8992).

xnbd-server --proxy --lport 8992 10.1.1.1 8992 cache.img cache.bitmap proxy.ctl

Updated and cached blocks are saved at a cache disk file (cache.img). A bitmap file (cache.bitmap) records block numbers of updated and cached blocks. A UNIX socket file (proxy.ctl) is created to control the proxy server (See the next example).

Then, an NBD client node connects to the proxy server.

nbd-client 10.255.255.254 8992 /dev/nbd0

If you want to add more clients to the target server, repeat these commands at each proxy server and client node.

A proxy server accepts NBD connections from other NBD proxies. This means that you can cascade multiple NBD proxies as figured in the below.

images/proxy-cascade.png

Scenario 3 (Live VM & disk migration with Xen)

A proxy server is used for relocating a virtual disk to another xNBD server. This mechanism transparently works with live migration of a VM.

In this example, 4 physical machines are used:

  • Source host node where a VM is started
  • Destination host node where the VM is migrated
  • xNBD target node exporting a virtual disk to the source host node
  • xNBD proxy node exporting a virtual disk to the destination host node

images/proxy-migration-overview.png

1. Setup a VM with an NBD disk

Source Side

First, setup a xNBD target server (10.10.1.1), and connect to it from a source host node (10.10.1.2). Then, create a VM with a virtual disk of /dev/nbd0.

In the xNBD target node,

dd if=/dev/zero of=disk.img bs=4096 count=1 seek=1000000
xnbd-server --target --lport 8992  disk.img

In the source host node,

modprobe nbd
echo deadline > /sys/block/nbd0/queue/scheduler
nbd-client bs=4096 10.10.1.1 8992 /dev/nbd0
virt-install -f /dev/nbd0  # or similar command to install a VM into /dev/nbd0
xm create /etc/xen/mydomain.cfg

When using Xen, the Domain configuration file of the VM will include the following entry.

disk = [ "phy:/dev/nbd0,xvda,w" ]

xNBD is independent from VMM implementations. It also works with KVM (qemu) and others. For instance, since KVM includes NBD client code, you do not need to set up /dev/nbd0 on a host OS. Specify the NBD disk directly in a command line.

qemu-system-x86_64 -hda nbd:10.10.1.1:8992
Destination Side

Next, setup an xNBD proxy server (10.20.1.1), and connect to it from a destination host node (10.20.1.2).

The xNBD proxy server redirects NBD I/O requests to the above target server (10.10.1.1).

xnbd-server --proxy --lport 8992 10.10.1.1 8992 cache.img cache.bitmap proxy.ctl

This command creates a UNIX socket file (proxy.ctl), via which {xnbd-bgctl controls the proxy server.

In the destination host, connect to the proxy server.

modprobe nbd
echo deadline > /sys/block/nbd0/queue/scheduler
nbd-client bs=4096 10.20.1.1 8992 /dev/nbd0

2. Migrate the VM to the destination

Next, start live migration.

In the source host,

xm migrate -l 2 10.20.1.2  # Domain ID is 2

After memory page relocation is completed, the VM is terminated at the source host, and then restarted at the destination. All disk I/O requests are intercepted at the xNBD proxy, and disk blocks are gradually cached (i.e., relocated) at the cache file.

images/proxy-migration-step2.png

3. Migrate all the disk blocks

There still remains not-yet-relocated blocks. Now copy them to the xNBD proxy.

In the xNBD proxy node,

xnbd-bgctl --cache-all proxy.ctl

images/proxy-migration-step3.png

After all blocks are cached at the proxy node, the NBD connection to the target server is no longer required. Now, you can change the xNBD proxy to a normal target server, disconnecting the NBD connection to the target server.

In the xNBD proxy node,

xnbd-bgctl --switch proxy.ctl

This command shutdowns the xNBD proxy server and restart it as a normal xNBD target server. All client NBD sessions are preserved.

images/proxy-migration-step0.png

Scenario 4 (OCFS2 with XNBD)

images/ocfs2_with_xnbd.png

For this example, assume there are 3 debian(squeeze) machines:

  • host1: xNBD target server. IP address is 172.16.0.101
  • host2: xNBD client node. This is also O2CB cluster node. IP address is 172.16.0.102
  • host3: xNBD client node. This is also O2CB cluster node. IP address is 172.16.0.103

On host1, start xnbd-server:

# dd if=/dev/zero of=disk.img bs=4096 count=1 seek=1000000
# xnbd-server --target --lport 8992 disk.img

On host2 and host3, establish an NBD session:

# modprobe nbd
# echo deadline > /sys/block/nbd0/queue/scheduler
# xnbd-client bs=4096 172.16.0.101 8992 /dev/nbd0

On host2 and host3, set up O2CB:

# aptitude install ocfs2-tools
# vi /etc/ocfs2/cluster.conf


cluster:
    node_count = 2
    name = cluster1

node:
    ip_port = 7777
    ip_address = 172.16.0.102
    number = 1
    name = host2
    cluster = cluster1

node:
    ip_port = 7777
    ip_address = 172.16.0.103
    number = 2
    name = host3
    cluster = cluster1


# vi /etc/default/o2cb


O2CB_ENABLED=true
O2CB_BOOTCLUSTER=cluster1
O2CB_HEARTBEAT_THRESHOLD=61
O2CB_IDLE_TIMEOUT_MS=30000
O2CB_KEEPALIVE_DELAY_MS=2000
O2CB_RECONNECT_DELAY_MS=2000


# /etc/init.d/o2cb start

On host2, make OCFS2 filesystem:

# mkfs.ocfs2 /dev/nbd0

On host2 and host3, mount OCFS2 volume:

# mount -t ocfs2 /dev/nbd0 /mnt/ocfs2

Now you can read/write from both machines.

Scenario 5 (Using named exports)

The recent NBD protocol allows an NBD client to request a target disk image name in the negotiation phase. NBD clients such as xnbd-client, nbd-client, and qemu support this feature.

xnbd-wrapper is used to support this feature in the server side.

images/xnbd_wrapper.png

First, start xnbd-wrapper on 10.1.1.1:8992:

xnbd-wrapper --port 8992 --imgfile /data/disk1 --imgfile /data/disk2

In this command line, two image files are registered to this xnbd-wrapper.

You can also dynamically register more disk images by using xnbd-wrapper-ctl.

xnbd-wrapper-ctl --add /data/disk3
xnbd-wrapper-ctl --add /data/disk4

xnbd-wrapper-ctl --list
xnbd-wrapper-ctl --remove <index>

For example, the first VM uses /data/disk1, the second VM uses /data/disk2:

qemu-kvm -hda nbd:10.0.0.1:8992:exportname=/data/disk1
qemu-kvm -hda nbd:10.0.0.1:8992:exportname=/data/disk2

qemu-kvm 0.14.0 or later is required.

Documentation

There are several papers focused on storage migration for virtual machines.

  • A Live Storage Migration Mechanism over WAN and its Performance Evaluation, Takahiro Hirofuchi, Hidemoto Nakada, Hirotaka Ogawa, Satoshi Itoh and Satoshi Sekiguchi, The 3rd International Workshop on Virtualization Technologies in Distributed Computing (VTDC2009), Jun 2009 Paper (PDF) Slides (PDF)
  • A Live Storage Migration Mechanism over WAN for Relocatable Virtual Machine Services on Clouds, Takahiro Hirofuchi, Hirotaka Ogawa, Hidemoto Nakada, Satoshi Itoh and Satoshi Sekiguchi, International Workshop on Cloud Computing (Cloud 2009), May 2009
  • A Relocatable Storage I/O Mechanism for Live-Migration of Virtual Machines over WAN, Takahiro Hirofuchi, 2008 USENIX Annual Technical Conference (Poster), Jun 2008. Poster (PDF)

Other papers written in Japanese are listed at http://grivon.apgrid.org/publications/ .

This figure outlines the design of the proxy mode (png | odg).

There are several NBD families.

  • ENBD (Enhanced NBD) http://www.enbd.org/ ENBD requires to compile a special kernel driver. Linux 2.6.x is not supported (?).
  • Blockfish Was an NBD server written in Java, supported an Amazon S3 backend.
  • Let me know more...

A shared xNBD disk among client nodes will be used with a cluster file system.

This project is partially supported by a goverment-funded research project for datacenter virtualization and Green IT.

Copyright

Copyright (C) 2008-2013 National Institute of Advanced Industrial Science and Technology. All rights reserved.

Note: This program partially includes small pieces of source code written by other open source projects under the terms of the GNU General Public License.

Development of xNBD was partially sponsored by Wavecon GmbH < www.wavecon.de >.

Contact

Takahiro Hirofuchi <t.hirofuchi _at_ aist.go.jp>

Updated

Tip: Filter by directory path e.g. /media app.js to search for public/media/app.js.
Tip: Use camelCasing e.g. ProjME to search for ProjectModifiedEvent.java.
Tip: Filter by extension type e.g. /repo .js to search for all .js files in the /repo directory.
Tip: Separate your search with spaces e.g. /ssh pom.xml to search for src/ssh/pom.xml.
Tip: Use ↑ and ↓ arrow keys to navigate and return to view the file.
Tip: You can also navigate files with Ctrl+j (next) and Ctrl+k (previous) and view the file with Ctrl+o.
Tip: You can also navigate files with Alt+j (next) and Alt+k (previous) and view the file with Alt+o.